I've been at Google Brain robotics (now referred to as Robotics @ Google) for nearly 3 years. It's helpful to reflect, from time to time, on the scientific, engineering and personal productivity takeaways gleaned from working on large research projects. Every researcher's unique experiences and experimentation can potentially become their personal competitive edge for thinking about new problems in unique ways. Here are mine (so far).
These are ordered chronologically (earliest work first), so that the reader can see how my past experiences shape my current biases and beliefs (orange = first author).
Categorical Reparameterization with Gumbel-Softmax
- The importance of a work environment that encourages serendipitous discovery and 20% time (the inspiration for Gumbel-Softmax came to me in a water cooler conversation I was having with Shane Gu).
- Research on very basic techniques (e.g. generative modeling) can have a huge impact through various downstream applications.
- The simplest method to implement is the one that gets cited the most.
End-to-End Learning of Semantic Grasping
- The notion of a "class label" is meaningless, and is the wrong way to tackle goal-conditioned grasping.
- ML can help robotics, but robotics can also help ML (i.e. retroactive labeling via present poses).
- The importance of moving fast, investing in visualization and analysis tools (e.g. notebooks) that do not require a robot.
Time Contrastive Networks
- All you need is high-quality data and a contrastive loss. Pierre Sermanet is fond of saying, tongue-in-cheek, that these two things will get us to AGI.
- Dream big.
Deep Reinforcement Learning for Vision-Based Robotic Grasping
- The importance of a fast prototyping environment and quick experiment turnaround times.
- Q-Learning works and scales pretty well.
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
- Most people don’t really care how QT-Opt is trained; they are excited about what a trained QT-Opt system can do.
- All you need is scale, compute, and data.
Grasp2Vec: Learning Object Representations from Self-Supervised Grasping
- Magical things can happen if you focus on innovations in better-structured data, instead of better algorithms (all you need is high-quality data and a contrastive loss).
- The notion of a class label is meaningless.
- Good reward functions are a very nice piece of "Software 2.0" infrastructure: modular functionality, quick to verify for correctness, and does not impose strong assumptions on upstream or downstream computations (in contrast to RL algorithms).
- More on Twitter.
- Thinking deeply about the nature of the OoD problem and different types of uncertainty.
- The OoD problem is ill-posed, but still useful for practical applications.
- OoD and generalization are two sides of the same coin.
- I spent a 10 days in Jeju mentoring DL camp students. Every day I woke up, ate 3 meals in the same cafeteria downstairs, had no meetings, and thought really hard about the research problem. This monastic working environment was tremendously useful for my creative "flow".
- Optimal control theory says that we need RL to make robots work, but you can get surprisingly far with the original Deep Learning recipe: supervised learning + lots of data + architecture tuning.
- Meta-Learning is all about pushing the burden of learning into the prior.
- Generative modeling (e.g. principled approaches to density estimation, being able to fit multi-modal distributions) is important for scaling up robotics.
- More on Twitter.
General Lessons from Deep RL + Robotics
- I am increasingly of the opinion that the biggest wins in making an ML system work come from high-quality data. Many researchers in sub-fields of ML do not prioritize the choice of data when looking for ways to improve on benchmarks. Deep RL on real robots is a great way to do ML research, because the researcher is forced to gather their own dataset and contend with how data biases generalization outcomes.
- Robotics is full-stack ML (gathering and serializing custom data, building a custom data pipeline, training and evaluation binaries, inference on a real robotic system), which increases iteration times & decreases opportunities for spontaneous creativity and discovery. Robotics projects tend to take ~1 FTE year to finish, while most DL papers can be completed in 2-3 months. One of the most important things to me right now is figuring out how we can achieve the same iteration speeds in robotics as achieved in other deep learning domains.
- Best software engineering practices for de-risking Deep RL engineering are in their early days. How to keep a full-stack dev environment flexible and fast to iterate on (scientific, creative risk) while keeping technical debt from bubbling over (execution risk)? My colleagues and I designed Tensor2Robot to solve a lot of our large-scale ML + robotics problems, but this is just the beginning.
The scope of this post is limited to my own research projects. Of course, there are papers that I didn't work on and inspire my views tremendously. I'll mention those in a follow-up blog post.