## Saturday, June 25, 2016

### Adversarial Exploration Policies for Robust Model Learning

This post was first published on 05/14/16, and has since been migrated to Blogger.

Brown University requires S.c.B students to take a capstone course that "studies a current topic in depth to produce a culminating artifact such as a paper of software project".

For my capstone project, I developed a more efficient sampling approach for use in learning POMDP models with Deep Learning. It's still a work in progress, but the results are pretty promising.

### Abstract:

Deep neural networks can be applied to model-based learning problems over continuous state-action spaces $S \times A$. By training a prediction network $\hat{f}_\tau : S \times A \to S$ on saved trajectory data, we can approximate the true transition function $f$ of the underlying Markov decision processes. $\hat{f}_\tau$ can then be used within optimal control and planning algorithms to predict the future''.

Robustness of $\hat{f}_\tau$ is crucial. If the robot (such as an autonomous vehicle) spends most of its exploration time in a small region of $S \times A$, then $\hat{f}_\tau$ may not be accurate in regions that the robot does not encounter often (such as collision trajectories). However, gathering enough training data to fully characterize $f$ over $S \times A$ is very time-consuming, and tends to result in many redundant samples.

In this work, I propose exploring $S \times U$ using an adversarial policy'' $\pi_\rho : S \to A$ that guides the robot into states and actions that maximize model loss. Policy parameters $\rho$ and model parameters $\tau$ are optimized in an alternating minimax game via stochastic gradient descent. Robot simulation experiments demonstrate that adversarial exploration policies improve model robustness with respect to the time the robot spends sampling the environment.