Brown University requires S.c.B students to take a capstone course that "studies a current topic in depth to produce a culminating artifact such as a paper of software project".
For my capstone project, I developed a more efficient sampling approach for use in learning POMDP models with Deep Learning. It's still a work in progress, but the results are pretty promising.
Abstract:
Deep neural networks can be applied to model-based learning problems over continuous state-action spaces $S \times A$. By training a prediction network $\hat{f}_\tau : S \times A \to S$ on saved trajectory data, we can approximate the true transition function $f$ of the underlying Markov decision processes. $\hat{f}_\tau$ can then be used within optimal control and planning algorithms to ``predict the future''.
Robustness of $\hat{f}_\tau$ is crucial. If the robot (such as an autonomous vehicle) spends most of its exploration time in a small region of $S \times A$, then $\hat{f}_\tau$ may not be accurate in regions that the robot does not encounter often (such as collision trajectories). However, gathering enough training data to fully characterize $f$ over $S \times A$ is very time-consuming, and tends to result in many redundant samples.
In this work, I propose exploring $S \times U$ using an ``adversarial policy'' $\pi_\rho : S \to A$ that guides the robot into states and actions that maximize model loss. Policy parameters $\rho$ and model parameters $\tau$ are optimized in an alternating minimax game via stochastic gradient descent. Robot simulation experiments demonstrate that adversarial exploration policies improve model robustness with respect to the time the robot spends sampling the environment.
Links:
- PDF of the writeup
- Project rearch notebook in blog format
- Source code on Github
- E2C vanilla implementation
- BoxBot simulator
Careers are a lot different from what these were previously. The world of technology or the improvisation of technology has totally reshaped the way careers were understood. In this era of intense competition, it is very difficult for people to launch a career that sustains. A sustainable and successful career is only possible with a lot of hard work, education and experience. For those who doesn’t have enough education, they can gain online life experience degree to overcome this weakness and move towards achieving a sustainable and successful career.
ReplyDeleteThe proposed Spatial-EM algorithm utilize median-based position and rank-based fling estimators to put back sample suggest and model covariance
ReplyDeletematrix in each M step, hence attractive constancy and robustness of the algorithm.Today technology has so many advantages.So many online services like best essay writing service trending now.
When students have to explore different things to maintain the level of their learning, they often need to have solutions by dissertation writing services to maintain their learning level.
ReplyDelete