Thursday, June 21, 2018

Bots & Thoughts from ICRA2018

The 35th International Conference on Robotics and Automation took place from May 21-25. I had a fantastic time attending my first ICRA: here is a brief thematic overview of my conference experience, research areas I’m most excited about, and cool robots I saw being demoed.

  • Great conference. Well-organized, thought-provoking talks, very chill and not too corporate or hyped like some Machine Learning conferences (NIPS, ICLR).
  • At a meta level, most of the papers presented here will never see the light of user-facing products. I think the technology gap between academia and industry labs is only going to increase in the coming years, and this should warrant a real existential crisis for academic robotics research.
  • Research contributions in robotics (mostly from industry) are starting to make a difference in the real world. Deep Learning-based perception is a real game-changer. Lots of interest and funding opportunities from Big Tech, VC firms, startups, and even nation-states.
  • I came into the conference prepared to learn why control theorists are so skeptical about Deep Learning, and challenge my own biases as a Deep RL researcher. I came out of the conference rather disappointed in traditional approaches to robotics and an even stronger conviction in end-to-end learning, big data, and deep learning. Oops.

The Conference

ICRA 2018 was extremely well-organized. Brisbane is a beautiful, clean, and tourist-friendly city, and the conference venue was splendid. Some statistics:
We received over 2500 submissions, a new record, from 61 countries.

The 10 most popular keywords, in descending order, were: Deep Learning in Robotics and Automation, Motion and Path Planning, Learning and Adaptive Systems, Localization, SLAM, Multi-Robot Systems, Optimization and Optimal Control, Autonomous Vehicle Navigation, Mapping, Aerial Systems - Applications.

From the very large number of high quality papers we selected 1030 for presentation which represents an acceptance rate of 40.6%.

I really enjoyed talking to companies at the sponsor booths and learning what kinds of problems their robotic solutions solved in the real world. This was much more educational to me than the booths at NIPS 2017, where it felt very corporate and mostly staffed by recruiters (photo below for comparison).

Taken at NIPS 2017. Also, snacks at NIPS were not as good as those at ICRA.

There was abundant food & refreshments during poster sessions and breaks. The conference schedule was a neat little pamphlet that fit inside the registration badge (along with tickets to any ancillary events), and the full program was given to us digitally on a 32GB usb drive. I appreciated that all the paper presentations were done as spotlight videos uploaded to YouTube. This helps to make research much more accessible to folks who don't have the means to travel to Australia. Many thanks to Alex Zelinsky (General Chair) and Peter Corke (Program Chair) for such a well-run multi-track conference.

As per usual, the robotics community is very non-diverse, and I hope that we as a community can take up stronger diversity & inclusion efforts soon, especially given the socioeconomic stakes of this technology.

It’s no longer socially appropriate to just solve hard problems in 2018, researchers now need to think of societal & ethical issues when building such powerful technology. On the bright side, this is a strong sign that our research matters!

Real World Robots vs. Academic Robotics

Rodney Brooks gave an opening plenary talk in which he hated on academia, hated on the first-generation Roomba, hated on deep learning, and especially hated on Human-Robot Interaction (HRI) research.

I loved it. Polarizing opinions -- even downright offensive opinions -- spark the fire of insightful discourse and distributed consensus. Here’s the basic gist of his talk (emphasis that these are his views and not necessarily mine):

  • Three tremendous economic forces looming on the horizon: 1) aging population 2) climate change 3) urbanization (people moving to big cities).
  • These forces will demand new labor for fixing, growing, manufacturing, and assisting, all while demographic inversion (#1) creates a massive labor shortage (it’s already arrived in some countries). 
  • Cheap labor from China is a thing of the past. 
  • The politically charged “robots taking our jobs” rhetoric is neither helpful nor accurate: those jobs are not desirable. Many factories in China have 15-16% labor turnover -- per month! Rod showed the following picture of a meat processing plant and asked a poignant question: “Would you aspire for your children to be working in these jobs?”

  • Robotics & automation is the only answer that can sustain our desired standard of living.
  • It’s quite evident that Rod does not think much of academic robotics. Great pioneers in computer science (Lovelace, Englebart, Jobs given as examples) were not concerned with the petty stakes of getting tenure, getting papers accepted at conferences, what other people thought of their work, or how hard / impossible their goals were. 
  • As members of the academic rat race (attending an academic conference), it's important to keep things in perspective and realize that customers and end-users of robotics do not even know that ICRA exists. 
  • Customers -- not being roboticists -- will demand features that might even compromise functionality. Usually you just have to give in to their unreasonable demands. Customers who open up Roombas never read the manual! Customers demand that their Roombas go up and down in straight lines, even when random trajectories clean better! 
  • A surprisingly critical view of Human-Robotic-Interaction research from out of nowhere. Rod Brooks claims “If you rely on statistics (p-values) to prove your idea is good, it’s not good” and “most HRI research is not reproducible.” He has a pretty savage invitation for someone to go and try to re-implement famous HRI papers, then tweak some nuisance variables (like age of the user or ordering of questions) and in order to obtain the opposite conclusion.
  • “Enough papers. Go and invent great stuff.” 

I thought it was a great talk. Rod is quick to throw a wet blanket on new technologies like deep learning and self-driving cars, but his experience in real-world robotics is unparalleled and he is quite hard on his own work (“we had no idea what we were doing with the 1st-gen Roomba; we were lucky to have survived”), which is refreshing to hear from a tech CEO.

It’s worth understanding where Rod’s pessimism comes from, and perhaps taking it as an invitation to prove his 2018 technology timeline wrong. For instance, he predicts that dextrous hands will be generally available by 2030-2040. What kinds of breakthroughs would put us “ahead of schedule”?

Other talks at ICRA about real-world robotic applications were much less sardonic, but the subtext remained the same. In the Real World™ , people are just not that interested in outperforming benchmarks, or demonstrating how novel their ideas are, or whether your robot uses deep neural networks or not. People just want things to work, and for algorithms to be scalable to real-world data and uncontrolled environments.

Show Me the Robots!

Matthew Dunbabin gave an awesome talk on COTSBot, an underwater autonomous vehicle that uses Deep Learning-based detection network to identify Crown Of Thorns Starfish on the seabed, and then injects the starfish with a lethal saline solution. This prevents starfish infestations from eating too much live coral.

Previously, COTS management required human divers to manually inject each arm of the starfish, which was extremely tedious. More critically, this requires dextrous autonomous manipulation -- carefully injecting each tentacle, lifting up starfish to get the one underneath -- something that robots cannot do yet in 2018.

The game-changer was the development of a saline solution that could kill a Starfish with a single injection, which absolved the robot of requiring human-level manipulation skills.

On the perception side, it was interesting to learn that pre-training the starfish detection system on Youtube videos ended up not being that helpful, due to a large domain shift from “centered glamour shots” of YouTube cameras and the real-world perceptual data (with murky / low visibility, moonlit conditions).

Projects like COTSBot are deeply meaningful to the health of the ecosystem, but all the same, the video clip of the robot autonomously locating a starfish and jamming a syringe into it drew a nervous chuckle from the audience.

CSIRO uses these cool legged robots for patrolling large swaths of grassland for things like environmental surveys.

Along similar veins, Agility Robotics and Anybotics are starting to make pretty interesting legged data-gathering platforms. 
The Oil & Gas industry is a particularly interesting customer for these kinds of robots. As it turns out, oil rigs are similar to home robotics in several ways: 
  • The environment is designed for human usage. It’s currently not cost-effective to re-design homes & oil rigs around robots, so robots must instead work in a anthropocentric environment. 
  • Legged navigation needed to get around.
  • The one exception is underwater monitoring and repair, where the lack of safe human alternatives means that oil folks are willing to re-design the task to better suit robots. For example, designing modules that are replaceable, modular units rather than having human divers perform repairs underwater.

Here’s a neat touch sensor that uses a camera to measure contact forces via optical dispersion of the little holes in rubber.

I’m excited about using high-resolution cameras to implement haptic & touch sensing, and I think this technology can scale to whole-body “skin sensors”. Why now?
  • Wiring an entire epithelium is hard. I think stretchable optical waveguides and consolidating many bundles of light into a few camera sensors are a potential solution to scaling high-resolution touch and force sensing with minimal electronic overhead.
  • Planning contacts under an analytical Model-Predictive-Control framework (i.e. “the old way of doing robotics”) is harder if the exterior is soft. Even if the sensors were packed onto the robot somehow, roboticists would not know how to deal with that volume of data & the complexity of geometry in real-time.
  • Machine Learning can be the breakthrough technology to learn force sensing from raw, highly unstructured, uncalibrated “squish” data.

I predicted last year that event-based cameras would be a big deal, and I’m still quite excited about this technology. I think that a lot of robotics perception problems simply go away if the loop is ridiculously fast.

The fabrication capabilities of academic labs are rather disappointing. Many robotics projects are motivated by some bold idea (let's make self-healing robots, let's make robots that have flexible exoskeletons like cockroaches, let's make robots that grow), but the concept is executed on crude hardware and limited by material science. A lot of these hardware projects would be WAY more useful if they were miniaturized 10X into a small form factor, which makes me wonder if the highest-impact thing that would benefit all robotics projects is better manufacturing & miniaturization capability. We should be thinking of robots as Swiss watches, not Dynamixel servos. 

Speaking of Swiss Watches, the Queensland Zoo brought out some native Australian wildlife for us to look at and pet. I’m always humbled by the complexity and sheer majesty of nature’s designs of autonomous systems; they put us roboticists to shame.

Do you know what an amazing machine a ribosome is? That’s how real robotics is done.  

Robotics & Venture Capital

The inevitable demographic demand for robotic technology has drawn VCs in like sharks to chummed water.

There was a workshop panel on “investing in robotics” where deep-tech VCs fielded some questions about their firm’s strategy and what they look for in portfolio companies. Some notes from this session:

  • First-world governments like Australia, Singapore, Canada, and China are eager to invest in robotics. Unclear where USA’s AI strategy is under the current administration.
  • The most common question asked by VCs during the Startup Pitch competition was “tell us more about the technology you have”. I think VCs want to see a company that has one or more deep technology moats.
  • I asked whether robotic hardware would eventually become commoditized, with the majority of margins coming from robotic software. The response I got back was that the tradeoff between software/hardware moats is cyclic: new hardware companies (e.g. deep learning chips) emerge when software squeezes everything it can out of existing hardware. I don't really agree here - I think software will be the dominant differentiating factor in all manner of robotic platforms.
  • An audience member astutely pointed out the opportunity for a service industry surrounding robotics: “We keep talking about automobiles, but nobody is talking about gas stations”. A former colleague of mine mentioned that he has a friend interested in clothing for robots to wear.
  • Rodney Brooks had a "fireside chat" at this workshop, in which he lamented a continuous struggle in dealing with customers who don’t understand their own robotics needs. He recounted a war story from the early days of Rethink robotics:
RB: “Look, the basic idea of the Baxter robot is to use force sensing to replace precision actuation (which is very expensive). This saves you -- the customer -- a ton of money.”
Customer: “But why doesn’t your robot have precision like my current robot?”
RB: “That level of precision is not necessary for a great deal of tasks like pick-and-placing”
Customer: “But the robot I’m using already has precision! Why can’t you do that too?”
RB: “Fine, we’ll make a robot called Sawyer that has the precision you want, and it’ll end up costing 2 times more.”
Customer: “Now I am happy.”

  • Australia lacks a competitive funding ecosystem -- that is -- many VC firms competing against each other to secure deal flow. Back in the day, NICTA was the funding agency for research and commercialization, which was not founder-friendly. A competitive VC ecosystem makes starting a startup much more attractive, which in turn brings in founder talent from other countries.
  • Chinese robotics companies have done a rather impressive job cloning US robots for much cheaper price points, such as the AUBO-i5 (clone of UR3 by Universal Robots), and Laikago (clone of Spot Mini by Boston Dynamics). Coupled with China’s manufacturing expertise, I think this could be the force that commoditizes robot hardware.
There was a little “Robotics Startup Pitch” competition where startups pitched their companies to these VCs to get some exposure. Some pitches sounded like they came out of an ICO whitepaper, but there were a few promising companies with real technology in the mix.

My favorite company was Hebi Robotics, which was a spinoff out of CMU’s snake robot lab. They ended up commercializing the “snake segment” into a modular actuator that enables researchers to put together low-volume robot prototypes quickly, much like LEGOs. The robots you can build with these kits are ideal for AI & Machine Learning researchers: arms, hexapods, mobile bases with manipulation capabilities... 

The Spectre of Deep Learning

A spectre is haunting Robotics: the spectre of Deep Learning...

Raia Hadsell’s excellent plenary talk contained a thought-provoking slide (made by Vincent Vanhoucke in 2016):

This is the gist of the message:
  1. Deep learning-based algorithms + huge amounts of data have been steamrolling classical speech recognition, classical computer vision, and classical machine translation benchmarks. We’ve basically thrown away decades of “old pipelines” in favor of big neural nets. 
  2. Will the same thing happen to classical robotics?

Given the obvious historical trope, it was quite surprising how hostile many roboticists are to the idea of throwing everything out and replacing it with neural nets. Today, the robotics community is pretty intellectually divided into two camps.

  1. "Traditional" robotics based on control theory with analytic models and probabilistic planning.
  2. The deep learning folks, who dispense with analytical understanding and just throw massive compute and data at the problem and let "Software 2.0" figure out a program from the data. 

I think a big reason these camps can’t understand each others' perspective is that they are solving different robotics tasks with different requirements, so they end up talking past each other when arguing for the merits of their approaches.

The “traditional” approach to robotics is popular in control, where a desired (low-dimensional) state must be realized. Safety is a hard constraint. Tasks include learning dynamic gaits on a quadruped, planning to avoid bumping into things, flying things, balancing heavy things that should not fall down.

However, this line of research largely avoids the problem of perception and solving diverse tasks at a semantic level, largely delegating it to a state estimation problem with an off-the-shelf perception stack or even ignoring the problem of performing tasks altogether. 

I asked a student from a well-known quadruped lab how they weighed their dynamic stability cost function with a task-specific cost function and got a blank stare: “Uh... what task?”

The "deep learning way" of doing robotics has only become popular recently, and derives its roots from computer vision researchers extending perception models to also perform control. These approaches excel at handling unstructured environments, unreliable state estimation, and generalization (e.g. unseen objects) quite well. 

However, because they are entirely data-driven, they often fail to generalize in ways that analytical methods handle trivially. By construction, it’s easy to see how an IK solver with a correctly specified model will always be better than a neural net approximation of that function.

As an employee of the Church of Deep Learning™, I came into the conference prepared to question my faith and learn the perspective of other labs and their approaches to solving real world problems.

I ended up being sorely disappointed. Coming out of the conference, I’m more convinced than ever that Deep Learning based control of robotics is the only approach that will ever scale to unstructured environments within the next decade.

Why I believe strongly in Deep Learning-based approaches to robotics warrants a much longer blog post for the future, but for now I’ll say this:

  • Deep Learning is accessible - I mentioned earlier that Hebi Robotics is enabling non-roboticists to build research hardware. Deep Learning does the same thing, but for control software. You don’t need a strong mathematical foundation anymore to train neural networks. Just concatenate some layers, gather some data -- hey presto -- your robot supports a new end-effector! 
  • Deep RL Research is scalable - a RNN technique pioneered in NLP space could be immediately applied to improving RNNs used in a robotics context. Deep Learning research feels much more collaborative, because all these people working on diverse problems now speak the same language. One can now cast SVMs, linear models, optimization, probabilistic inference, and even cognitive neuroscience all into the common language of neural nets. A DL non-believer was complaining to me that we were heading towards a research monoculture, similar to how every roboticist was working on SLAM in the early 2000's. If this "monoculture" enables new scales of collaboration and bridging of learning theory across disciplines, then I’m all for it!
  • The problems one must solve to bring lab robotics to the real world are so challenging that data-driven approaches are the only way to make them tractable. Analytical approaches require too many compromises on robot design, too much hand-tuning for the specific system.
Finally, there are reasonable, diplomatic folks who believe in marrying the benefits of both data-driven learning and analytical task-specific knowledge, but this reminds me of computer vision people who believed in fine-tuning SVMs on top of the last layer of a deep neural net back in 2015 to “get the best of both worlds.”

Data, though expensive to gather, is always right.

Much of robotics is built on the mindset of obtaining geometry, exact localization, planning around exact dynamics. However, one exciting avenue of Deep Reinforcement Learning is that learning control from vision enables control strategies that only require gaze heuristics and optical flow, rather than precise tracking and acting in a localized Euclidean space. 

Mandyam Srinivasan gave a very interesting keynote talk on biologically-inspired aerial robotics in which they modeled honeybee’s abilities for estimating distance based on optical flow. It turns out that bees don't really estimate states and forces like robots do, they just map visual motion cues to wings beating faster and slower, and everything turns out fine.

In terms of sensors, I think moving beyond precise (and expensive!) IMUs and joint encoders and instead, sensing everything from vision is the way to go. All other sensors can probably be mapped to vision too (such as optical waveguides!), and maybe Convnets and RNNs can handle the rest.

Here’s another cool example of how data-driven software can even replace expensive force sensing. Remember those skin sensors I was talking about earlier? According to the work shown in the poster below, it turns out you can just use a RNN to predict those surface contact "force transients" simply by the amount of “feedback” the motors in your arm feel when they bump into something.

Looking Forward

I attended a workshop ("Grand Scientific Challenges for the Robot Companion of the Future") where the panel of distinguished roboticists were asked what they thought were the grand challenges & questions for robotics. Here were some of the responses:
  • Energy management (power consumption, batteries)
  • Predictions & mirror neurons
  • What is the generic representation of actions?
  • An understanding of Machines vs Life (this was Oussama Khatib)
  • Wearable exoskeleton, as if it were part of the body
  • Human-computer hybrid - cheap memory capacity
I'd like to add 3 additional technologies that I believe could be game-changing for robotics:

  1. Artificial Materials: synthesizing polymers with self-healing abilities, the ability to co-locate actuation, sensing, and computation in the same volume.
  2. Artificial Muscles: Modular, electrically or chemically actuated, and at the millimeter scale. 
  3. Artificial Life: Large-scale ecosystems of agents struggling to not die, compete for resources, and reproduce.

Will follow up on these in later blog posts...

Sunday, April 1, 2018

Aesthetically Pleasing Learning Rates

By Eric Jang, Colin Raffel, Ishaan Gulrajani, Diogo Moitinho de Almeida

In this blog post, we abandoned all pretense of theoretical rigor and used pixel values from natural images as learning rate schedules.

The learning rate schedule is an important hyperparameter to choose when training neural nets. Set the learning rate too high, and the loss function may fail to converge. Set the learning rate too low, and the model may take a long time to train, or even worse, overfit.

The optimal learning rate is commensurate with the scale of the smoothness of the gradient of the loss function (or in fancy words, the “Lipschitz constant” of a function). The smoother the function, the larger the learning rate we are allowed to take without the optimization “blowing up”.

What is the right learning rate schedule? Exponential decay? Linear decay? Piecewise constant? Ramp up then down? Cyclic? Warm restarts? How big should the batch size be? How does this all relate to generalization (what ML researchers care about)?

Tragically, in the non-convex, messy world of deep neural networks, all theoretical convergence guarantees are off. Often those guarantees rely on a restrictive set of assumptions, and then the theory-practice gap is written off by showing that it also “empirically works well” at training neural nets.

Fortunately, the research community has spent thousands of GPU years establishing empirical best practices for the learning rate:

Given that theoretically-motivated learning rate scheduling is really hard, we ask, "why not use a learning rate schedule which is at least aesthetically pleasing?" Specifically, we scan across a nice-looking image one pixel at a time, and use the pixel intensities as our learning rate schedule.

We begin with a few observations:

  • Optimization of non-convex objectives seem to benefit from injecting “temporally correlated noise” into the parameters we are trying to optimize. Accelerated stochastic gradient descent methods also exploit temporally correlated gradient directions via momentum. Stretching this analogy a bit, we note that in reinforcement learning, auto-correlated noise seems to be beneficial for state-dependent exploration (1, 2, 3, 4). 
  • Several recent papers (5, 6, 7) suggest that waving learning rates up and down are good for deep learning.
  • Pixel values from natural images have both of the above properties. When reshaped into a 1-D signal, an image waves up and down in a random manner, sort of like Brownian motion. Natural images also tend to be lit from above, which lends to a decaying signal as the image gets darker on the bottom.

We compared several learning rate schedules on MNIST and CIFAR-10 classification benchmarks, training each model for about 100K steps. Here's what we tried:
  • baseline: The default learning rate schedules provided by the github repository.
  • fixed: 3e-4 with Momentum Optimizer.
  • andrej: 3e-4, with Adam Optimizer
  • cyclic: Cyclic learning rates according to the following code snippet:
base_lr = 1e-5
max_lr = 1e-2
step_size = 1000
step = tf.cast(global_step, tf.float32)
cycle = tf.floor(1+step/(2*step_size))
x = tf.abs(step/step_size - 2*cycle + 1)
learning_rate = base_lr + (max_lr-base_lr)*tf.maximum(0., (1.-x))

  • image-based learning rates using the following code:
base_lr = 1e-5
max_lr = 1e-2
im =
num_steps = _NUM_IMAGES['train']*FLAGS.train_epochs/FLAGS.batch_size
w, h = im.size
f = np.sqrt(w*h*3/num_steps)
im = im.resize((int(float(w)/f), int(float(h)/f)))
im = np.array(im).flatten().astype(np.float32)/255
im_t = tf.constant(im)
step = tf.minimum(global_step, im.size-1)
pixel_value = im_t[step]
learning_rate = base_lr + (max_lr - base_lr) * pixel_value

Candidate Images

We chose some very aesthetically pleasing images for our experiments.


bad_mnist.jpg (MNIST training image labeled as a 4)






Which one gives the best learning rate?


Here are the top-1 accuracies on the CIFAR-10 validation set. All learning rate schedules are trained with the Momentum Optimizer (except andrej, where we use Adam).

The default learning rate schedules provided by the github repo are quite strong, beating all of our alternative learning rate schedules.

The Mona Lisa and puppy images turns out to be a pretty good schedules, even better than cyclic learning rates and Andrej Karpathy’s favorite 3e-4 with Adam. The "bad MNIST" digit appears to be a pretty dank learning rate schedule too, just edging out Geoff’s portrait (you’ll have to imagine the error bars on your own). All learning rates perform about equally well on MNIST.

The fixed learning rate of 3e-4 is quite bad (unless one uses the Adam optimizer). Our experiments suggest that maybe pretty much any learning rate schedule can outperform a fixed one, so if ever you see or think about writing a paper with a constant learning rate, just use literally any schedule instead. Even a silly one. And then cite this blog post.

Future Work

  • Does there exist a “divine” natural image whose learning rate schedule results in low test error among a wide range of deep learning tasks? 
  • It would also be interesting to see if all images of puppies produce good learning rate schedules. We think it is very likely, since all puppers are good boys. 
  • Stay tuned for “Aesthetically Pleasing Parameter Noise for Reinforcement Learning” and “Aesthetically Pleasing Random Seeds for Neural Architecture Search”.


We thank Geoff Hinton for providing the misclassified MNIST image, Vincent Vanhoucke for reviewing this post, and Crossroads Cafe for providing us with refreshments.

Friday, February 23, 2018


06/23/2018: Xiaoyi Yin (尹肖贻) has translated this post to 中文. Thanks Xiaoyi!

Once upon a time, there was a machine learning researcher who tried to teach a child what a "teacup" was.

"Hullo mister. What do you do?" inquires the child.

"Hi there, child! I'm a machine learning scientist. My life ambition is to create 'Artificial General Intelligence', which is a computer that can do everything a human --"

The child completely disregards this remark, as children often do, and asks a question that has been troubling him all day:

"Mister, what's a teacup? My teacher Ms. Johnson used that word today but I don't know it."

The scientist is appalled that a fellow British citizen does not know what a teacup is, so he pulls out his phone and shows the child a few pictures:

"Oh..." says the child. "A teacup is anything that's got flowers on it, right? Like this?"

The child is alarmingly proficient at using a smartphone.

"No, that's not a teacup," says the scientist. "Here are some more teacups, this time without the flowers."

The child's face crinkles up with thought, then un-crinkles almost immediately - he's found a new pattern.

"Ok, a teacup is anything where there is an ear-shaped hole facing to the right - after all, there is something like that in every one of the images!"

He pulls up a new image to display what he thinks a teacup is, giggling because he thinks ears are funny.

"No, that's an ear. A teacup and ear are mutually exclusive concepts. Let's do some data augmentation. These are all teacups too!"

The scientist rambles on,

"Now I am going to show you some things that are not teacups! This should force your discriminatory boundary to ignore features that teacups and other junk have in common ... does this help?"

"Okay, I think I get it now. A teacup is anything with an holder thing, and is also empty. So these are not teacups:"

"Not quite, the first two are teacups too. And teacups are actually supposed to contain tea."

The child is now confused.

"but what happens if a teacup doesn't have tea but has fizzy drink in it? What if ... what if ... you cut a teacup in halfsies, so it can't hold tea anymore?" His eyes go wide as saucers as he says this, as if cutting teacups is the most scandalous thing he has ever heard of.

"Err... hopefully most of your training data doesn't have teacups like that. Or chowder bowls with one handle, for that matter."

The scientist also mutters something about "stochastic gradient descent being Bayesian" but fortunately the kid doesn't hear him say this.

The child thinks long and hard, iterating over the images again and again.

"I got it! There is no pattern, a teacup is merely any one of the following pictures:"

"Well... if you knew nothing about the world I could see how you arrived at that conclusion... but what if I said that you had some prior about how object classes ought to vary across form and rendering style and --"

"But Mister, what's a prior?"

"A prior is whatever you know beforehand about the distribution over random teacups ... err... never mind. Can you find an explanation for teacups that doesn't require memorizing 14 images? The smaller the explanation, the better."

"But how should I know how small the explanation of teacup ought to be?", asks the child.

"Oh," says the scientist. He slinks away, defeated.

Tuesday, January 23, 2018

Doing a Concurrent Masters at Brown

This is intended as a reference for students who are interested in the Concurrent Bachelor's/Masters programIf you are not a current or prospective undergraduate student at Brown University, the following post won't be relevant to you. 

A few Brown University students have been emailing me about the Concurrent Bachelor's/Masters (CM) degree and whether it would make sense for them to apply for this program. Brown doesn't offer a whole lot of information or resources on this topic (very few students do this), so I'd like to share my perspective as someone who went through the process (I graduated in May 2016 with a ScB in APMA-CS and a MSc in CS). This is not official advice - rules for the CM program may have changed since I graduated.


Universities like UC Berkeley allow undergraduates to graduate early, provided that they have satisfied all their degree requirements. Some students who complete their undergrad in 3 years (6 semesters) use the leftover year to do their "5th-year" Masters degree at the school, thus getting a Bachelors and Masters degree in 4 years.

Brown has 5th-year Masters programs too (CS dept has a popular one), but undergraduates cannot graduate earlier (it's possible to graduate in 7 semesters but this is rarely granted).

The Concurrent Masters degree does permit one to graduate with a Masters in 8 or 9 semesters though.

Strangely, CM doesn't seem to be advertised much at Brown - there weren't any guides or resources or other students to talk to for planning my schedule around CM (guidance counselors and Meiklejohns don't really encourage or know about unorthodox paths like these).

How to plan for CM

The CM application requirements can be found here
  1. During your First-Year (or beginning of Sophomore year at the latest), draw up your course plan for all 8 semesters to meet CM requirements. It will probably be re-arranged a lot (especially upper-level classes) each semester, but every set of courses you pick should keep you on track to meeting CM requirements.
  2. At the beginning of the Spring semester of Junior Year, bring your partially-completed CM application to your department chair, and show them how you are on track to fulfilling the requirements. Have them examine your application to see that your courses do indeed qualify and you are in good academic standing (i.e. you will also fulfill your intended undergrad degree requirements by graduation).
  3. Get recommendation letters from professors and the dept chair. You will need a lot of them - 3 within concentration, 2 outside concentration.
  4. Bring your packet (with rec letters) to the Dean of the College, who is in charge of CM review process.
  5. The applications are reviewed by the academic standing committee by April of your Junior year. You need to meet the course requirements, have approval of your dept. chair, have good letters of references, and say something fairly reasonable on your letter to the committee. From that point it's approved somewhat automatically.
  6. The CM course schedule is approved Junior year, but is contingent on classes that may not actually be offered your senior year. You will probably submit amendment forms to the application during your Senior year. They should be approved as long as they are reasonable substitutions.
  7. I recommend finishing your capstone requirements and 2nd writing requirement during your junior year. This removes a lot of constraints from the schedule optimization problem. 

What courses did you take?

I had a pretty unorthodox curriculum at Brown and basically stretched the "open curriculum" interpretation as far as I could to (barely) satisfy my degree and concurrent masters requirements. I didn't take many intro-level CS courses and substituted those requirements with upper-div math and CS courses. Towards the end the CS department chair got pretty annoyed with all the substitutions I was making; bless twd@ for being so patient with me. Here was my 4-year course schedule:

Fall 2012
Seminar: Computing as done in Brains and Computers

Methods of Applied Math I

Intro to MRI and Neuroimaging

Computational Cognitive Neuroscience

Intro to Probability and Computing
Spring 2013
Advanced Fiction

Computational Neuroscience

Intermediate Animation

Neurochemistry and Behavior

Monte Carlo Simulation with Applications to Finance
Fall 2013
Introduction to Computer Graphics

Abstract Algebra

Independent Study

Digital Electronics Systems Design
Spring 2014
Building Intelligent Robots

Interactive Computer Graphics

Recent Applications of Probability and Statistics

Individual Independent Study
Fall 2014
Probabilistic Graphical Models

Computer Networks

Data-Driven Vision and Graphics

Quantum Mechanics A

TA Apprenticeship: Full Credit
Spring 2015
Neural Dynamics: Theory and Modeling

Topics in Chaotic Dynamics

Individual Independent Study

Corporate Finance

Painting II (RISD)
Fall 2015
Models of Computation

Introduction to Composition

Persuasive Communication

Introduction to Computational Linear Algebra

Reading and Research (Masters Project)
Spring 2016
Reading and Research (Masters Project)

The Politics of Food


Operating Systems

Optimization Methods in Finance

ScB requirements:

My courses
MATH1530 instead of MATH0350
MATH0540 waived (AP Test)

Applied Mathematics
APMA1360 in lieu of APMA0360

Core Computer Science
(CSCI2980-HCI, CSCI1670) in lieu of (CSCI15, CSCI16)
CSCI 1450 (math)
CSCI 1680 in lieu of CS33 (systems)
CSCI0510 (math) (f15)
Additional Requirements
3 1000-level CS courses
CSCI1970 (approved pair waived via TA credit)

3 1000-level APMA courses
Pair: APMA1720 + APMA1740

Capstone course

And here's how I filled out the CM requirements. Note that degree requirements are subject to change and the courses I filled out may not be valid for current Brown students.

Is it Worth It?


  • Some entry-level roles in quantitative finance and Machine Learning strongly prefer candidates with at least a Masters degree.
  • Saves tuition compared to doing a 5-year Masters.
  • In the Bay Area (California), having a Masters Degree negotiates you a better interest rate for mortgages. 


  • Way more work compared to doing a 5-year masters. Mostly comes from the 10-course breadth requirements.
  • Being spread pretty thinly across many classes makes retaining information harder. You need to take an average of 4+ classes every semester, and the 10-course breadth requirements have to be completed before you submit your application.
  • Maintaining a social life with this course load is tricky.
I do not recommend doing CM just for the sake of getting a Masters degree - a Masters degree isn't that helpful in the big picture of things and you should only do it if it would require minor changes to the course plan you are already pursuing, or whether it is vital to your career.

Three other students in the CS department (two CS-Math concentrators and one other CS-APMA concentrator) did CM in the class of 2016. We all enjoyed taking hard CS/Math classes and would have probably taken the schedules we had anyway.