Eric Jang: Bots & Thoughts from ICRA2018

The 35th International Conference on Robotics and Automation took place from May 21-25. I had a fantastic time attending my first ICRA: here is a brief thematic overview of my conference experience, research areas I’m most excited about, and cool robots I saw being demoed.

Summary:

Great conference. Well-organized, thought-provoking talks, very chill and not too corporate or hyped like some Machine Learning conferences (NIPS, ICLR).
At a meta level, most of the papers presented here will never see the light of user-facing products. I think the technology gap between academia and industry labs is only going to increase in the coming years, and this should warrant a real existential crisis for academic robotics research.
Research contributions in robotics (mostly from industry) are starting to make a difference in the real world. Deep Learning-based perception is a real game-changer. Lots of interest and funding opportunities from Big Tech, VC firms, startups, and even nation-states.
I came into the conference prepared to learn why control theorists are so skeptical about Deep Learning, and challenge my own biases as a Deep RL researcher. I came out of the conference rather disappointed in traditional approaches to robotics and an even stronger conviction in end-to-end learning, big data, and deep learning. Oops.

The Conference

ICRA 2018 was extremely well-organized. Brisbane is a beautiful, clean, and tourist-friendly city, and the conference venue was splendid. Some statistics:

We received over 2500 submissions, a new record, from 61 countries.

The 10 most popular keywords, in descending order, were: Deep Learning in Robotics and Automation, Motion and Path Planning, Learning and Adaptive Systems, Localization, SLAM, Multi-Robot Systems, Optimization and Optimal Control, Autonomous Vehicle Navigation, Mapping, Aerial Systems - Applications.

From the very large number of high quality papers we selected 1030 for presentation which represents an acceptance rate of 40.6%.

I really enjoyed talking to companies at the sponsor booths and learning what kinds of problems their robotic solutions solved in the real world. This was much more educational to me than the booths at NIPS 2017, where it felt very corporate and mostly staffed by recruiters (photo below for comparison).

Taken at NIPS 2017. Also, snacks at NIPS were not as good as those at ICRA.

There was abundant food & refreshments during poster sessions and breaks. The conference schedule was a neat little pamphlet that fit inside the registration badge (along with tickets to any ancillary events), and the full program was given to us digitally on a 32GB usb drive. I appreciated that all the paper presentations were done as spotlight videos uploaded to YouTube. This helps to make research much more accessible to folks who don't have the means to travel to Australia. Many thanks to Alex Zelinsky (General Chair) and Peter Corke (Program Chair) for such a well-run multi-track conference.

As per usual, the robotics community is very non-diverse, and I hope that we as a community can take up stronger diversity & inclusion efforts soon, especially given the socioeconomic stakes of this technology.

It’s no longer socially appropriate to just solve hard problems in 2018, researchers now need to think of societal & ethical issues when building such powerful technology. On the bright side, this is a strong sign that our research matters!

Real World Robots vs. Academic Robotics

Rodney Brooks gave an opening plenary talk in which he hated on academia, hated on the first-generation Roomba, hated on deep learning, and especially hated on Human-Robot Interaction (HRI) research.

I loved it. Polarizing opinions -- even downright offensive opinions -- spark the fire of insightful discourse and distributed consensus. Here’s the basic gist of his talk (emphasis that these are his views and not necessarily mine):

Three tremendous economic forces looming on the horizon: 1) aging population 2) climate change 3) urbanization (people moving to big cities).

These forces will demand new labor for fixing, growing, manufacturing, and assisting, all while demographic inversion (#1) creates a massive labor shortage (it’s already arrived in some countries).

Cheap labor from China is a thing of the past.

The politically charged “robots taking our jobs” rhetoric is neither helpful nor accurate: those jobs are not desirable. Many factories in China have 15-16% labor turnover -- per month! Rod showed the following picture of a meat processing plant and asked a poignant question: “Would you aspire for your children to be working in these jobs?”

Robotics & automation is the only answer that can sustain our desired standard of living.
It’s quite evident that Rod does not think much of academic robotics. Great pioneers in computer science (Lovelace, Englebart, Jobs given as examples) were not concerned with the petty stakes of getting tenure, getting papers accepted at conferences, what other people thought of their work, or how hard / impossible their goals were.
As members of the academic rat race (attending an academic conference), it's important to keep things in perspective and realize that customers and end-users of robotics do not even know that ICRA exists.
Customers -- not being roboticists -- will demand features that might even compromise functionality. Usually you just have to give in to their unreasonable demands. Customers who open up Roombas never read the manual! Customers demand that their Roombas go up and down in straight lines, even when random trajectories clean better!
A surprisingly critical view of Human-Robotic-Interaction research from out of nowhere. Rod Brooks claims “If you rely on statistics (p-values) to prove your idea is good, it’s not good” and “most HRI research is not reproducible.” He has a pretty savage invitation for someone to go and try to re-implement famous HRI papers, then tweak some nuisance variables (like age of the user or ordering of questions) and in order to obtain the opposite conclusion.
“Enough papers. Go and invent great stuff.”

I thought it was a great talk. Rod is quick to throw a wet blanket on new technologies like deep learning and self-driving cars, but his experience in real-world robotics is unparalleled and he is quite hard on his own work (“we had no idea what we were doing with the 1st-gen Roomba; we were lucky to have survived”), which is refreshing to hear from a tech CEO.

It’s worth understanding where Rod’s pessimism comes from, and perhaps taking it as an invitation to prove his 2018 technology timeline wrong. For instance, he predicts that dextrous hands will be generally available by 2030-2040. What kinds of breakthroughs would put us “ahead of schedule”?

Other talks at ICRA about real-world robotic applications were much less sardonic, but the subtext remained the same. In the Real World™ , people are just not that interested in outperforming benchmarks, or demonstrating how novel their ideas are, or whether your robot uses deep neural networks or not. People just want things to work, and for algorithms to be scalable to real-world data and uncontrolled environments.

Show Me the Robots!

Matthew Dunbabin gave an awesome talk on COTSBot, an underwater autonomous vehicle that uses Deep Learning-based detection network to identify Crown Of Thorns Starfish on the seabed, and then injects the starfish with a lethal saline solution. This prevents starfish infestations from eating too much live coral.

Previously, COTS management required human divers to manually inject each arm of the starfish, which was extremely tedious. More critically, this requires dextrous autonomous manipulation -- carefully injecting each tentacle, lifting up starfish to get the one underneath -- something that robots cannot do yet in 2018.

The game-changer was the development of a saline solution that could kill a Starfish with a single injection, which absolved the robot of requiring human-level manipulation skills.

On the perception side, it was interesting to learn that pre-training the starfish detection system on Youtube videos ended up not being that helpful, due to a large domain shift from “centered glamour shots” of YouTube cameras and the real-world perceptual data (with murky / low visibility, moonlit conditions).

Projects like COTSBot are deeply meaningful to the health of the ecosystem, but all the same, the video clip of the robot autonomously locating a starfish and jamming a syringe into it drew a nervous chuckle from the audience.

CSIRO uses these cool legged robots for patrolling large swaths of grassland for things like environmental surveys.

Along similar veins, Agility Robotics and Anybotics are starting to make pretty interesting legged data-gathering platforms. The Oil & Gas industry is a particularly interesting customer for these kinds of robots. As it turns out, oil rigs are similar to home robotics in several ways:

The environment is designed for human usage. It’s currently not cost-effective to re-design homes & oil rigs around robots, so robots must instead work in a anthropocentric environment.
Legged navigation needed to get around.
The one exception is underwater monitoring and repair, where the lack of safe human alternatives means that oil folks are willing to re-design the task to better suit robots. For example, designing modules that are replaceable, modular units rather than having human divers perform repairs underwater.

Here’s a neat touch sensor that uses a camera to measure contact forces via optical dispersion of the little holes in rubber.

I’m excited about using high-resolution cameras to implement haptic & touch sensing, and I think this technology can scale to whole-body “skin sensors”. Why now?

Wiring an entire epithelium is hard. I think stretchable optical waveguides and consolidating many bundles of light into a few camera sensors are a potential solution to scaling high-resolution touch and force sensing with minimal electronic overhead.
Planning contacts under an analytical Model-Predictive-Control framework (i.e. “the old way of doing robotics”) is harder if the exterior is soft. Even if the sensors were packed onto the robot somehow, roboticists would not know how to deal with that volume of data & the complexity of geometry in real-time.
Machine Learning can be the breakthrough technology to learn force sensing from raw, highly unstructured, uncalibrated “squish” data.

I predicted last year that event-based cameras would be a big deal, and I’m still quite excited about this technology. I think that a lot of robotics perception problems simply go away if the loop is ridiculously fast.

The fabrication capabilities of academic labs are rather disappointing. Many robotics projects are motivated by some bold idea (let's make self-healing robots, let's make robots that have flexible exoskeletons like cockroaches, let's make robots that grow), but the concept is executed on crude hardware and limited by material science. A lot of these hardware projects would be WAY more useful if they were miniaturized 10X into a small form factor, which makes me wonder if the highest-impact thing that would benefit all robotics projects is better manufacturing & miniaturization capability. We should be thinking of robots as Swiss watches, not Dynamixel servos.

Speaking of Swiss Watches, the Queensland Zoo brought out some native Australian wildlife for us to look at and pet. I’m always humbled by the complexity and sheer majesty of nature’s designs of autonomous systems; they put us roboticists to shame.

Do you know what an amazing machine a ribosome is? That’s how real robotics is done.

Robotics & Venture Capital

The inevitable demographic demand for robotic technology has drawn VCs in like sharks to chummed water.

There was a workshop panel on “investing in robotics” where deep-tech VCs fielded some questions about their firm’s strategy and what they look for in portfolio companies. Some notes from this session:

First-world governments like Australia, Singapore, Canada, and China are eager to invest in robotics. Unclear where USA’s AI strategy is under the current administration.
The most common question asked by VCs during the Startup Pitch competition was “tell us more about the technology you have”. I think VCs want to see a company that has one or more deep technology moats.
I asked whether robotic hardware would eventually become commoditized, with the majority of margins coming from robotic software. The response I got back was that the tradeoff between software/hardware moats is cyclic: new hardware companies (e.g. deep learning chips) emerge when software squeezes everything it can out of existing hardware. I don't really agree here - I think software will be the dominant differentiating factor in all manner of robotic platforms.
An audience member astutely pointed out the opportunity for a service industry surrounding robotics: “We keep talking about automobiles, but nobody is talking about gas stations”. A former colleague of mine mentioned that he has a friend interested in clothing for robots to wear.
Rodney Brooks had a "fireside chat" at this workshop, in which he lamented a continuous struggle in dealing with customers who don’t understand their own robotics needs. He recounted a war story from the early days of Rethink robotics:

RB: “Look, the basic idea of the Baxter robot is to use force sensing to replace precision actuation (which is very expensive). This saves you -- the customer -- a ton of money.”
Customer: “But why doesn’t your robot have precision like my current robot?”
RB: “That level of precision is not necessary for a great deal of tasks like pick-and-placing”
Customer: “But the robot I’m using already has precision! Why can’t you do that too?”
RB: “Fine, we’ll make a robot called Sawyer that has the precision you want, and it’ll end up costing 2 times more.”
Customer: “Now I am happy.”

Australia lacks a competitive funding ecosystem -- that is -- many VC firms competing against each other to secure deal flow. Back in the day, NICTA was the funding agency for research and commercialization, which was not founder-friendly. A competitive VC ecosystem makes starting a startup much more attractive, which in turn brings in founder talent from other countries.
Chinese robotics companies have done a rather impressive job cloning US robots for much cheaper price points, such as the AUBO-i5 (clone of UR3 by Universal Robots), and Laikago (clone of Spot Mini by Boston Dynamics). Coupled with China’s manufacturing expertise, I think this could be the force that commoditizes robot hardware.

There was a little “Robotics Startup Pitch” competition where startups pitched their companies to these VCs to get some exposure. Some pitches sounded like they came out of an ICO whitepaper, but there were a few promising companies with real technology in the mix.

My favorite company was Hebi Robotics, which was a spinoff out of CMU’s snake robot lab. They ended up commercializing the “snake segment” into a modular actuator that enables researchers to put together low-volume robot prototypes quickly, much like LEGOs. The robots you can build with these kits are ideal for AI & Machine Learning researchers: arms, hexapods, mobile bases with manipulation capabilities...

The Spectre of Deep Learning

A spectre is haunting Robotics: the spectre of Deep Learning...

Raia Hadsell’s excellent plenary talk contained a thought-provoking slide (made by Vincent Vanhoucke in 2016):

This is the gist of the message:

Deep learning-based algorithms + huge amounts of data have been steamrolling classical speech recognition, classical computer vision, and classical machine translation benchmarks. We’ve basically thrown away decades of “old pipelines” in favor of big neural nets.
Will the same thing happen to classical robotics?

Given the obvious historical trope, it was quite surprising how hostile many roboticists are to the idea of throwing everything out and replacing it with neural nets. Today, the robotics community is pretty intellectually divided into two camps.

"Traditional" robotics based on control theory with analytic models and probabilistic planning.
The deep learning folks, who dispense with analytical understanding and just throw massive compute and data at the problem and let "Software 2.0" figure out a program from the data.

I think a big reason these camps can’t understand each others' perspective is that they are solving different robotics tasks with different requirements, so they end up talking past each other when arguing for the merits of their approaches.

The “traditional” approach to robotics is popular in control, where a desired (low-dimensional) state must be realized. Safety is a hard constraint. Tasks include learning dynamic gaits on a quadruped, planning to avoid bumping into things, flying things, balancing heavy things that should not fall down.

However, this line of research largely avoids the problem of perception and solving diverse tasks at a semantic level, largely delegating it to a state estimation problem with an off-the-shelf perception stack or even ignoring the problem of performing tasks altogether.

I asked a student from a well-known quadruped lab how they weighed their dynamic stability cost function with a task-specific cost function and got a blank stare: “Uh... what task?”

The "deep learning way" of doing robotics has only become popular recently, and derives its roots from computer vision researchers extending perception models to also perform control. These approaches excel at handling unstructured environments, unreliable state estimation, and generalization (e.g. unseen objects) quite well.

However, because they are entirely data-driven, they often fail to generalize in ways that analytical methods handle trivially. By construction, it’s easy to see how an IK solver with a correctly specified model will always be better than a neural net approximation of that function.

As an employee of the Church of Deep Learning™, I came into the conference prepared to question my faith and learn the perspective of other labs and their approaches to solving real world problems.

I ended up being sorely disappointed. Coming out of the conference, I’m more convinced than ever that Deep Learning based control of robotics is the only approach that will ever scale to unstructured environments within the next decade.

Why I believe strongly in Deep Learning-based approaches to robotics warrants a much longer blog post for the future, but for now I’ll say this:

Deep Learning is accessible - I mentioned earlier that Hebi Robotics is enabling non-roboticists to build research hardware. Deep Learning does the same thing, but for control software. You don’t need a strong mathematical foundation anymore to train neural networks. Just concatenate some layers, gather some data -- hey presto -- your robot supports a new end-effector!
Deep RL Research is scalable - a RNN technique pioneered in NLP space could be immediately applied to improving RNNs used in a robotics context. Deep Learning research feels much more collaborative, because all these people working on diverse problems now speak the same language. One can now cast SVMs, linear models, optimization, probabilistic inference, and even cognitive neuroscience all into the common language of neural nets. A DL non-believer was complaining to me that we were heading towards a research monoculture, similar to how every roboticist was working on SLAM in the early 2000's. If this "monoculture" enables new scales of collaboration and bridging of learning theory across disciplines, then I’m all for it!
The problems one must solve to bring lab robotics to the real world are so challenging that data-driven approaches are the only way to make them tractable. Analytical approaches require too many compromises on robot design, too much hand-tuning for the specific system.

Finally, there are reasonable, diplomatic folks who believe in marrying the benefits of both data-driven learning and analytical task-specific knowledge, but this reminds me of computer vision people who believed in fine-tuning SVMs on top of the last layer of a deep neural net back in 2015 to “get the best of both worlds.”

Data, though expensive to gather, is always right.

Much of robotics is built on the mindset of obtaining geometry, exact localization, planning around exact dynamics. However, one exciting avenue of Deep Reinforcement Learning is that learning control from vision enables control strategies that only require gaze heuristics and optical flow, rather than precise tracking and acting in a localized Euclidean space.

Mandyam Srinivasan gave a very interesting keynote talk on biologically-inspired aerial robotics in which they modeled honeybee’s abilities for estimating distance based on optical flow. It turns out that bees don't really estimate states and forces like robots do, they just map visual motion cues to wings beating faster and slower, and everything turns out fine.

In terms of sensors, I think moving beyond precise (and expensive!) IMUs and joint encoders and instead, sensing everything from vision is the way to go. All other sensors can probably be mapped to vision too (such as optical waveguides!), and maybe Convnets and RNNs can handle the rest.

Here’s another cool example of how data-driven software can even replace expensive force sensing. Remember those skin sensors I was talking about earlier? According to the work shown in the poster below, it turns out you can just use a RNN to predict those surface contact "force transients" simply by the amount of “feedback” the motors in your arm feel when they bump into something.

Looking Forward

I attended a workshop ("Grand Scientific Challenges for the Robot Companion of the Future") where the panel of distinguished roboticists were asked what they thought were the grand challenges & questions for robotics. Here were some of the responses:

Energy management (power consumption, batteries)

Predictions & mirror neurons

What is the generic representation of actions?

An understanding of Machines vs Life (this was Oussama Khatib)

Wearable exoskeleton, as if it were part of the body

Human-computer hybrid - cheap memory capacity

I'd like to add 3 additional technologies that I believe could be game-changing for robotics:

Artificial Materials: synthesizing polymers with self-healing abilities, the ability to co-locate actuation, sensing, and computation in the same volume.
Artificial Muscles: Modular, electrically or chemically actuated, and at the millimeter scale.
Artificial Life: Large-scale ecosystems of agents struggling to not die, compete for resources, and reproduce.

Will follow up on these in later blog posts...

Eric Jang

Thursday, June 21, 2018

Bots & Thoughts from ICRA2018