Eric Jang: Adversarial Exploration Policies for Robust Model Learning

Saturday, June 25, 2016

Adversarial Exploration Policies for Robust Model Learning

This post was first published on 05/14/16, and has since been migrated to Blogger.

Brown University requires S.c.B students to take a capstone course that "studies a current topic in depth to produce a culminating artifact such as a paper of software project".

For my capstone project, I developed a more efficient sampling approach for use in learning POMDP models with Deep Learning. It's still a work in progress, but the results are pretty promising.

Abstract:

Deep neural networks can be applied to model-based learning problems over continuous state-action spaces $S \times A$. By training a prediction network $\hat{f}_\tau : S \times A \to S$ on saved trajectory data, we can approximate the true transition function $f$ of the underlying Markov decision processes. $\hat{f}_\tau$ can then be used within optimal control and planning algorithms to ``predict the future''.

Robustness of $\hat{f}_\tau$ is crucial. If the robot (such as an autonomous vehicle) spends most of its exploration time in a small region of $S \times A$, then $\hat{f}_\tau$ may not be accurate in regions that the robot does not encounter often (such as collision trajectories). However, gathering enough training data to fully characterize $f$ over $S \times A$ is very time-consuming, and tends to result in many redundant samples.

In this work, I propose exploring $S \times U$ using an ``adversarial policy'' $\pi_\rho : S \to A$ that guides the robot into states and actions that maximize model loss. Policy parameters $\rho$ and model parameters $\tau$ are optimized in an alternating minimax game via stochastic gradient descent. Robot simulation experiments demonstrate that adversarial exploration policies improve model robustness with respect to the time the robot spends sampling the environment.

Links:

3 comments:

UnknownMarch 16, 2017 at 12:22 PM
Careers are a lot different from what these were previously. The world of technology or the improvisation of technology has totally reshaped the way careers were understood. In this era of intense competition, it is very difficult for people to launch a career that sustains. A sustainable and successful career is only possible with a lot of hard work, education and experience. For those who doesn’t have enough education, they can gain online life experience degree to overcome this weakness and move towards achieving a sustainable and successful career.
ReplyDelete
Replies
Jens C. KruseMay 23, 2017 at 2:22 AM
The proposed Spatial-EM algorithm utilize median-based position and rank-based fling estimators to put back sample suggest and model covariance
matrix in each M step, hence attractive constancy and robustness of the algorithm.Today technology has so many advantages.So many online services like best essay writing service trending now.

ReplyDelete
Replies
Albert BarkleyOctober 7, 2017 at 11:26 PM
When students have to explore different things to maintain the level of their learning, they often need to have solutions by dissertation writing services to maintain their learning level.
ReplyDelete
Replies

Add comment

Comments will be reviewed by administrator (to filter for spam and irrelevant content).