Saturday, November 28, 2020

Software and Hardware for General Robots

Disclaimer, these are just my opinions and not necessarily those of my employer or robotics colleagues.

2021-04-23: If you liked this post, you may be interested in a more recent blog post I wrote on why I believe in end-to-end learning for robots.

Hacker News Discussion

Moravec's Paradox describes the observation that our AI systems can solve "adult-level cognitive" tasks like chess-playing or passing text-based intelligence tests fairly easily, while accomplishing basic sensorimotor skills like crawling around or grasping objects - things one-year old children can do - are very difficult.

Anyone who has tried to build a robot to do anything will realize that Moravec's Paradox is not a paradox at all, but rather a direct corollary of our physical reality being so irredeemably complex and constantly demanding. Modern humans traverse millions of square kilometers in their lifetime, a labyrinth full of dangers and opportunities. If we had to consciously process and deliberate all the survival-critical sensory inputs and motor decisions like we do moves in a game of chess, we would have probably been selected out of the gene pool by Darwinian evolution. Evolution has optimized our biology to perform sensorimotor skills in a split second and make it feel easy. 

Another way to appreciate this complexity is to adjust your daily life to a major motor disability, like losing fingers or trying to get around San Francisco without legs.

Software for General Robots

The difficulty of sensorimotor problems is especially apparent to people who work in robotics and get their hands dirty with the messiness of "the real world". What are the consequences of an irredeemably complex reality on how we build software abstractions for controlling robots? 

One of my pet peeves is when people who do not have sufficient respect for Moravec's Paradox propose a programming model where high-level robotic tasks ("make me dinner") can be divided into sequential or parallel computations with clearly defined logical boundaries: wash rice, de-frost meat, get the plates, set the table, etc. These sub-tasks can be in turn broken down further. When a task cannot be decomposed further because there are too many edge cases for conventional software to handle ("does the image contain a cat?"), we can attempt to invoke a Machine Learning model as "magic software" for that capability.

This way of thinking - symbolic logic that calls upon ML code - arises from engineers who are used to clinging to the tidiness of Software 1.0 abstractions and programming tutorials that use cooking analogies. 

Do you have any idea how much intelligence goes into a task like "fetching me a snack", at the very lowest levels of motor skill? Allow me to illustrate. I recorded a short video of me opening a package of dates and annotated it with all the motor sub-tasks I performed in the process.


In the span of 36 seconds, I counted about 14 motor and cognitive skills. They happened so quickly that I didn't consciously notice them until I went back and analyzed the video, frame by frame. 

Here are some of the things I did:
  • Leverage past experience opening this sort of package to understand material properties and how much force to apply.
  • Constantly adapt my strategy in response to unforeseen circumstances (Ziploc not giving)
  • Adjusting grasp when slippage occurs
  • Devising an ad-hoc Weitlaner Retractor with thumb knuckles to increase force on the Ziploc.
As a roboticist, it's humbling to watch videos of animals making decisions so quickly and then watch our own robots struggle to do the simplest things. We even have to speed up the robot video 4x-8x to prevent the human watcher from getting bored! 

With this video in mind, let's consider where we currently are in the state of robotic manipulation. In the last decade or so, multiple research labs have used deep learning to develop robotic systems that can perform any-object robotic grasping from vision. Grasping is an important problem because in order to manipulate objects, one must usually first grasp them. It took the Google Robotics and X teams 2-3 years to develop our own system, QT-Opt. This was a huge research achievement because it was a general method that worked on pretty much any object and, in principle, could be used to learn other tasks. 

Some people think that this capability to pick up objects can be wrapped in a simple programmatic API and then used to bootstrap us to human-level manipulation. After all, hard problems are just composed of simpler problems, right? 

I don't think it's quite so simple. The high-level API call "pick_up_object()" implies a clear semantic boundary between when the robot grasping begins and when it ends. If you re-watch the above video above, how many times do I perform a grasp? It's not clear to me at all where you would slot those function calls. Here is a survey if you are interested in participating in a poll of "how many grasps do you see in this video", whose results I will update in this blog post. 

If we need to solve 13 additional manipulation skills just to open a package of dates, and each one of these capabilities take 2-3 years to build, then we are a long, long way from making robots that match the capabilities of humans. Never mind that there isn't a clear strategy for how to integrate all these behaviors together into a single algorithmic routine. Believe me, I wish reality was simple enough that complex robotic manipulation could be done mostly in Software 1.0. However, as we move beyond pick-and-place towards dexterous and complex tasks, I think we will need to completely rethink how we integrate different capabilities in robotics.

As you might note from the video, the meaning of a "grasp" is somewhat blurry. Biological intelligence was not specifically evolved for grasping - rather, hands and their behaviors emerged from a few core drives:  regulate internal and external conditions, find snacks, replicate.

None of this is to say that our current robot platforms and the Software 1.0 programming models are useless for robotics research or applications. A general purpose function pick_up_object() can still be combined with "Software 1.0 code" into a reliable system worth billions of dollars in value to Amazon warehouses and other logistics fulfillment centers. General pick-and-place for any object in any unstructured environment remains an unsolved, valuable, and hard research problem.

Hardware for General Robots

What robotic hardware do we require in order to "open a package of dates"?

Willow Garage was one of the pioneers in home robots, showing that a teleoperated PR2 robot could be used to tidy up a room (note that two arms are needed here for more precise placement of pillows). These are made up of many pick-and-place operations.
 

This video was made in 2008. That was 12 years ago! It's sobering to think of how much time has passed and how little the needle has seemingly moved. Reality is hard. 

The Stretch is a simple telescoping arm attached to a vertical gantry. It can do things like pick up objects, wipe planar surfaces, and open drawers.



However, futurist beware! A common source of hype for people who don't think enough about physical reality is to watch demos of robots doing useful things in one home, and then conclude that the same robots are ready to do those tasks in any home.

The Stretch video shows the robot pulling open a dryer door (left-swinging) and retrieving clothes from it. The video is a bit deceptive - I think the camera  physically cannot see the interior of the dryer, so even though a human can teleoperate the robot to do the task, it would run into serious difficulty when ensuring that the dryer has been completely emptied.

Here is a picture of my own dryer, which features a dryer with a right-swinging door close to a wall. I'm not sure if the Stretch actually can fit in this tight space, but the PR2 definitely would not be able to open this door without the base getting in the way. 




Reality's edge cases are often swept under the rug when making robot demo videos, which usually show the robot operating in an optimal environment that the robot is well-suited for. But the full range of tasks humans do in the home is vast.  Neither the PR2 nor the Stretch can crouch under a table to pick up lint off the floor, change a lightbulb while standing on a chair, fix caulking in a bathroom, open mail with a letter opener, move dishes from the dishwasher to the high cabinets, break down cardboard boxes for the recycle bin, go outside and retrieve the mail. 

And of course, they can't even open a Ziploc package of dates. If you think that was complex, here is a first-person video of me chopping strawberries, washing utensils, and decorating a cheesecake. This was recorded with a GoPro strapped to my head. Watch each time my fingers twitch - each one is a separate manipulation task!



We often talk about a future where robots do our cooking for us, but I don't think it's possible with any hardware on the market today. The only viable hardware for a robot meant to do any task in human spaces is an adult-sized humanoid, with two-arms, two-legs, and five fingers on each hand. 

Just like I discussed about Software 1.0 in robotics, there is still an enormous space of robot morphologies that can still provide value to research and commercial applications. That doesn't change the fact that any alternative hardware can't do all the things a humanoid can in a human-centric space. Agility Robotics is one of the companies that gets it on the hardware design front. People who build physical robots use their hands a lot - could you imagine the robot you are building assembling a copy of itself? 


Why Don't We Just Design Environments to be More Robot-Friendly?

A compromise is to co-design the environment with the robot to avoid infeasible tasks like above. This can simplify both the hardware and software problems. Common examples I hear incessantly go like this:
  1. Washing machines are better than a bimanual robot washing dishes in the sink, and a dryer is a more efficient machine than a human hanging out clothes to air-dry.
  2. Airplanes are better at transporting humans than birds 
  3. We built cars and roads, not faster horses 
  4. Wheels can bear more weight and are more energetically efficient than legs.
In the home robot setting, we could design special dryer machine doors that the robot can open easily, or have custom end-effectors (tools) for each task instead of a five-fingered hand. We could go as far as to to have the doors be motorized and open themselves with a remote API call, so the robot doesn't even need to open the dryer on its own.

At the far end of this axis, why even bother with building a robot? We could re-imagine the design of homes themselves to be a single ASRS system that brings you whatever you need from any location in the house like a Dumbwaiter (except it would work horizontally and vertically). This would dispenses with the need to have a robot walking around in your home.

This pragmatic line of thinking is fine for commercial applications, but as a human being and a scientist, it feels a bit like a concession of defeat that we cannot make robots do tasks the way humans do. Let's not forget the Science Fiction dreams that inspired so many of us down this career path - it is not about doing the tasks better, it is about doing everything humans can. A human can wash dishes and dry clothes by hand, so a truly general-purpose robot should be able too. For many people, this endeavor is as close as we can get to Biblical Creation: “Let Us make man in Our image, after Our likeness, to rule over the fish of the sea and the birds of the air, over the livestock, and over all the earth itself and every creature that crawls upon it.” 

Yes, we've built airplanes to fly people around. Airplanes are wonderful flying machines. But to build a bird, which can do a million things and fly? That, in my mind, is the true spirit of general purpose robotics.

Sunday, September 27, 2020

My Criteria for Reviewing Papers

Xiaoyi Yin (尹肖贻) has kindly translated this post into Chinese (中文)

Accept-or-reject decisions for the NeurIPS 2020 conference are out, with 9454 submissions and 1900 accepted papers (20% acceptance rate). Congratulations to everyone (regardless of acceptance decision) for their hard work in doing good research!

It's common knowledge among machine learning (ML) researchers that acceptance decisions at NeurIPS and other conferences are something of a weighted dice roll. In this silly theatre we call "Academic Publishing"  -- a mostly disjoint concept from research by the way --,  reviews are all over the place because each reviewer favors different things in ML papers. Here are some criteria that a reviewer might care about: 

Correctness: This is the bare minimum for a scientific paper. Are the claims made in the paper scientifically correct? Did the authors take care not to train on the test set? If an algorithm was proposed, do the authors convincingly show that it works for the reasons they stated? 

New Information: Your paper has to contribute new knowledge to the field. This can take the form of a new algorithm, or new experimental data, or even just a different way of explaining an existing concept. Even survey papers should contain some nuggets of new information, such as a holistic view unifying several independent works.

Proper Citations: a related work section that articulates connections to prior work and why your work is novel. Some reviewers will reject papers that don't tithe prior work adequately, or isn't sufficiently distinguished from it.

SOTA results: It's common to see reviewers demand that papers (1) propose a new algorithm and (2)  achieve state-of-the-art (SOTA) on a benchmark. 

More than "Just SOTA": No reviewer will penalize you for achieving SOTA, but some expect more than just beating the benchmark, such as one or more of the criteria in this list. Some reviewers go as far as to bash the "SOTA-chasing" culture of the field, which they deem to be "not very creative" and "incremental". 

Simplicity: Many researchers profess to favor "simple ideas". However, the difference between "your simple idea" and "your trivial extension to someone else's simple idea" is not always so obvious.

Complexity: Some reviewers deem papers that don't present any new methods or fancy math proofs as "trivial" or "not rigorous".

Clarity & Understanding: Some reviewers care about the mechanistic details of proposed algorithms and furthering understanding of ML, not just achieving better results. This is closely related to "Correctness".

Is it "Exciting"?: Julian Togelius (AC for NeurIPS '20) mentions that many papers he chaired were simply not very exciting. Only Julian can know what he deems "exciting", but I suppose he means having "good taste" in choosing research problems and solutions. 




Sufficiently Hard Problems: Some reviewers reject papers for evaluating on datasets that are too simple, like MNIST. "Sufficiently hard" is a moving goal post, with the implicit expectation that as the field develops better methods the benchmarks have to get harder to push unsolved capabilities. Also, SOTA methods on simple benchmarks are not always SOTA on harder benchmarks that are closer to real world applications. Thankfully my most cited paper was written at a time where it was still acceptable to publish on MNIST.

Is it Surprising? Even if a paper demonstrates successful results, a reviewer might claim that they are unsurprising or "obvious". For example, papers applying standard object recognition techniques to a novel dataset might be argued to be "too easy and straightforward" given that the field expects supervised object recognition to be mostly solved (this is not really true, but the benchmarks don't reflect that). 

I really enjoy papers that defy intuitions, and I personally strive to write surprising papers. 

Some of my favorite papers in this category do not achieve SOTA or propose any new algorithms at all:

  1. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
  2. Understanding Deep Learning Requires Rethinking Generalization.
  3. A Metric Learning Reality Check
  4. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations 
  5. Adversarial Spheres

Is it Real? Closely related to "sufficiently hard problems". Some reviewers think that games are a good testbed to study RL, while others (typically from the classical robotics community) think that Mujoco Ant and a real robotic quadruped are entirely different problems; algorithmic comparisons on the former tell us nothing about the same set of experiments on the latter.

Does Your Work Align with Good AI Ethics? Some view the development of ML technology as a means to build a better society, and discourage papers that don't align with their AI ethics. The required "Broader Impact" statements in NeurIPS submissions this year are an indication that the field is taking this much more seriously. For example, if you submit a paper that attempts to infer criminality from only facial features or perform autonomous weapon targeting, I think it's likely your paper will be rejected regardless of what methods you develop.

Different reviewers will prioritize different aspects of the above, and many of these criteria are highly subjective (e.g. problem taste, ethics, simplicity). For each of the criteria above, it's possible to come up with counterexamples of highly-cited or impactful ML papers that don't meet that criteria but possibly meet others.


My Criteria

I wanted to share my criteria for how I review papers. When it comes to recommending accept/reject, I mostly care about Correctness and New Information. Even if I think your paper is boring and unlikely to be an actively researched topic in 10 years, I will vote to accept it as long as your paper helped me learn something new that I didn't think was already stated elsewhere. 

Some more specific examples:

  • If you make a claim about humanlike exploration capabilities in RL in your introduction and then propose an algorithm to do something like that, I'd like to see substantial empirical justification that the algorithm is indeed similar to what humans do.
  • If your algorithm doesn't achieve SOTA, that's fine with me. But I would like to see a careful analysis of why your algorithm doesn't achieve it and why.
  • When papers propose new algorithms, I prefer to see that the algorithm is better than prior work. However, I will still vote to accept if the paper presents a factually correct analysis of why it doesn't do better than prior work. 
  • If you claim that your new algorithm works better because of reason X, I would like to see experiments that show that it isn't because of alternate hypotheses X1, X2. 
Correctness is difficult to verify. Many metric learning papers were proposed in the last 5 years and accepted at prestigious conferences, only for Musgrave et al. '20 to point out that the experimental methodology between these papers were not consistent.

I should get off my high horse and say that I'm part of the circus too. I've reviewed papers for 10+ conferences and workshops and I can honestly say that I only understood 25% of papers from just reading them. An author puts in tens or hundreds of hours into designing and crafting a research paper and the experimental methodology, and I only put in a few hours in deciding whether it is "correct science". Rarely am I able to approach a paper with the level of mastery needed to rigorously evaluate correctness.

A good question to constantly ask yourself is: "what experiment would convince me that the author's explanations are correct and not due to some alternate hypothesis? Did the authors check that hypothesis?"

I believe that we should accept all "adequate" papers, and more subjective things like "taste" and "simplicity" should be reserved for paper awards, spotlights, and oral presentations. I don't know if everyone should adopt this criteria, but I think it's helpful to at least be transparent as a reviewer on how I make accept/reject decisions. 

Opportunities for Non-Traditional Researchers

If you're interested in getting mentorship for learning how to read, critique, and write papers better, I'd like to plug my weekly office hours, which I hold on Saturday mornings over Google Meet. I've been mentoring about 6 people regularly over the last 3 months and it's working out pretty well. 

Anyone who is not in a traditional research background (not currently in an ML PhD program) can reach out to me to book an appointment. You can think of this like visiting your TA's office hours for help with your research work. Here are some of the services I can offer, completely pro bono:

  • If you have trouble understanding a paper I can try to read it with you and offer my thoughts on it as if I were reviewing it.
  • If you're very very new to the field and don't even know where to begin I can offer some starting exercises like reading / summarizing papers, re-producing existing papers, and so on.
  • I can try to help you develop a good taste of what kinds of problems to work on, how to de-risk ambitious ideas, and so on.
  • Advice on software engineering aspects of research. I've been coding for over 10 years; I've picked up some opinions on how to get things done quickly.
  • Asking questions about your work as if I was a visitor at your poster session.
  • Helping you craft a compelling story for a paper you want to write.
No experience is required, all that you need to bring to the table is a desire to become better at doing research. The acceptance rate for my office hours is literally 100% so don't be shy!

Sunday, September 13, 2020

Chaos and Randomness

For want of a nail the shoe was lost.

For want of a shoe the horse was lost.

For want of a horse the rider was lost.

For want of a rider the message was lost.

For want of a message the battle was lost.

For want of a battle the kingdom was lost.

And all for the want of a horseshoe nail.

For Want of a Nail


Was the kingdom lost due to random chance? Or was it the inevitable outcome resulting from sensitive dependence on initial conditions? Does the difference even matter? Here is a blog post about Chaos and Randomness with Julia code.


Preliminaries


Consider a real vector space $X$ and a function $f: X \to X$ on that space. If we repeatedly apply $f$ to a starting vector $x_1$, we get a sequence of vectors known as an orbit $x_1, x_2, ... ,f^n(x_1)$. 

For example, the logistic map is defined as 

function logistic_map(r, x)
   r*x*(1-x) 
end

Here is a plot of successive applications of the logistic map for r=3.5. We can see that the system constantly oscillates between two values, ~0.495 and ~0.812. 


Definition of Chaos

There is surprisingly no universally accepted mathematical definition of Chaos. For now we will present a commonly used characterization by Devaney: 




We can describe an orbit $x_1, x_2, ... ,f^n(x_1)$ as *chaotic* if:

  1. The orbit is not asymptotically periodic, meaning that it never starts repeating, nor does it approach an orbit that repeats (e.g. $a, b, c, a, b, c, a, b, c...$).
  2. The maximum Lyapunov exponent $\lambda$ is greater than 0. This means that if you place another trajectory starting near this orbit, it will diverge at a rate $e^\lambda$. A positive $\lambda$ implies that two trajectories will diverge exponentially quickly away from each other. If $\lambda<0$, then the distance between trajectories would shrink exponentially quickly. This is the basic definition of "Sensitive Dependence to Initial Conditions (SDIC)", also colloquially understood as the "butterfly effect".

Note that (1) intuitively follows from (2), because the Lyapunov exponent of an orbit that approaches a periodic orbit would be $<0$, which contradicts the SDIC condition.

We can also define the map $f$ itself to be chaotic if there exists an invariant (trajectories cannot leave) subset $\tilde{X} \subset X$, where the following three conditions hold:




  1. Sensitivity to Initial Conditions, as mentioned before.
  2. Topological mixing (every point in orbits in $\tilde{X}$ approaches any other point in $\tilde{X}$).
  3. Dense periodic orbits (every point in $\tilde{X}$ is arbitrarily close to a periodic orbit). At first, this is a bit of a head-scratcher given that we previously defined an orbit to be chaotic if it *didn't* approach a periodic orbit. The way to reconcile this is to think about the subspace $\tilde{X}$ being densely covered by periodic orbits, but they are all unstable so the chaotic orbits get bounced around $\tilde{X}$ for all eternity, never settling into an attractor but also unable to escape $\tilde{X}$.
Note that SDIC actually follows from the second two conditions. If these unstable periodic orbits cover the set $\tilde{X}$ densely and orbits also cover the set densely while not approaching the periodic ones, then intuitively the only way for this to happen is if all periodic orbits are unstable (SDIC).



These are by no means the only way to define chaos. The DynamicalSystems.jl package has an excellent documentation on several computationally tractable definitions of chaos.

Chaos in the Logistic Family


Incidentally, the logistic map exhibits chaos for most of the values of r from values 3.56995 to 4.0. We can generate the bifurcation diagram quickly thanks to Julia's de-vectorized way of numeric programming.

rs = [2.8:0.01:3.3; 3.3:0.001:4.0]
x0s = 0.1:0.1:0.6
N = 2000 # orbit length
x = zeros(length(rs), length(x0s), N)
# for each starting condtion (across rows)
for k = 1:length(rs)
    # initialize starting condition
    x[k, :, 1] = x0s
    for i = 1:length(x0s)
       for j = 1:N-1
            x[k, i, j+1] = logistic_map((r=rs[k] , x=x[k, i, j])...)
       end
    end
end
plot(rs, x[:, :, end], markersize=2, seriestype = :scatter, title = "Bifurcation Diagram (Logistic Map)")

We can see how starting values y1=0.1, y2=0.2, ...y6=0.6 all converge to the same value, oscillate between two values, then start to bifurcate repeatedly until chaos emerges as we increase r.




Spatial Precision Error + Chaos = Randomness

What happens to our understanding of the dynamics of a chaotic system when we can only know the orbit values with some finite precision? For instance,  x=0.76399 or x=0.7641 but we only observe x=0.764 in either case.

We can generate 1000 starting conditions that are identical up to our measurement precision, and observe the histogram of where the system ends up after n=1000 iterations of the logistic map.


Let's pretend this is a probabilistic system and ask the question: what are the conditional distributions of $p(x_n|x_0)$, where $n=1000$, for different levels of measurement precision?

At less than $O(10^{-8})$ precision, we start to observe the entropy of the state evolution rapidly increasing. Even though we know that the underlying dynamics are deterministic, measurement uncertainty (a form of aleotoric uncertainty) can expand exponentially quickly due to SDIC. This results in $p(x_n|x_0)$ appearing to be a complicated probability distribution, even generating "long tails".

I find it interesting that the "multi-modal, probabilistic" nature of $p(x_n|x_0)$ vanishes to a simple uni-modal distribution when measurement is sufficiently high to mitigate chaotic effects for $n=1000$. In machine learning we concern ourselves with learning fairly rich probability distributions, even going as far as to learn transformations of simple distributions into more complicated ones. 

But what if we are being over-zealous with using powerful function approximators to model $p(x_n|x_0)$? For cases like the above, we are discarding the inductive bias that $p(x_n|x_0)$ arises from a simple source of noise (uniform measurement error) coupled with a chaotic "noise amplifier". Classical chaos on top of measurement error will indeed produce Indeterminism, but does that mean we can get away with treating $p(x_n|x_0)$ as purely random?

I suspect the apparent complexity of many "rich" probability distributions we encounter in the wild are more often than not just chaos+measurement error (e.g. weather). If so, how can we leverage that knowledge to build more useful statistical learning algorithms and draw inferences?

We already know that chaos and randomness are nearly equivalent from the perspective of computational distinguishability. Did you know that you can use chaos to send secret messages? This is done by having Alice and Bob synchronize a chaotic system $x$ with the same initial state $x_0$, and then Alice sends a message $0.001*signal + x$. Bob merely evolves the chaotic system $x$ on his own and subtracts it to recover the signal. Chaos has also been used to design pseudo-random number generators. 

Saturday, June 20, 2020

Free Office Hours for Non-Traditional ML Researchers

Xiaoyi Yin (尹肖贻) has kindly translated this post into Chinese (中文)

This post was prompted by a tweet I saw from my colleague, Colin:


I'm currently a researcher at Google with a "non-traditional background", where non-traditional background means "someone who doesn't have a PhD". People usually get PhDs so they can get hired for jobs that require that credential. In the case of AI/ML, this might be to become a professor at a university, or land a research scientist position at a place like Google, or sometimes even both.

At Google it's possible to become a researcher without having a PhD, although it's not very easy. There are a two main paths [1]:

One path is to join an AI Residency Program, which are fixed-term jobs from non-university institution (FAANG companies, AI2, etc.) that aim to jump-start a research career in ML/AI. However, these residencies are usually just 1 year long and are not long enough to really "prove yourself" as a researcher.

Another path is to start as a software engineer (SWE) in an ML-focused team and build your colleagues' trust in your research abilities. This was the route I took: I joined Google in 2016 as a software engineer in the Google Brain Robotics team. Even though I was a SWE by title, it made sense to focus on the "most important problem", which was to think really hard about why the robots weren't doing what we wanted and train deep neural nets in an attempt to fix those problems. One research project led to another, and now I just do research + publications all the time.

As the ML/AI publishing field has grown exponentially in the last few years, it has gotten harder to break into research (see Colin's tweet). Top PhD programs like BAIR usually require students to have a publication at a top conference like ICML, ICLR, NeurIPS before they even apply. I'm pretty sure I would not have been accepted to any PhD programs if I were graduating from college today, and would have probably ended up taking a job offer in quantitative finance instead.

The uphill climb gets even steeper for aspiring researchers with non-traditional backgrounds; they are competing with no shortage of qualified PhD students. As Colin alludes to, it is also getting harder for internationals to work at American technology companies and learn from American schools, thanks to our administration's moronic leadership.

The supply-demand curves for ML/AI labor are getting quite distorted. On one hand, we have a tremendous global influx of people wanting to solve hard engineering problems and contribute to scientific knowledge and share it openly with the world. On the other hand, there seems to be a shortage of formal training:
  1. A research mentor to learn the academic lingo and academic customs from, and more importantly, how to ask good questions and design experiments to answer them.
  2. Company environments where software engineers are encouraged to take bold risks and lead their own research (and not just support researchers with infra).

Free Office Hours

I can't do much for (2) at the moment, but I can definitely help with (1). To that end, I'm offering free ML research mentorship to aspiring researchers from non-traditional backgrounds via email and video conferencing.

I'm most familiar with applied machine learning, robotics, and generative modeling, so I'm most qualified to offer technical advice in these areas. I have a bunch of tangential interests like quantitative finance, graphics, and neuroscience. Regardless of technical topic, I can help with academic writing and de-risking ambitious projects and choosing what problems to work on. I also want to broaden my horizons and learn more from you.

If you're interested in using this resource, send me an email at <myfirstname><mylastname><2004><at><g****.com>. In your email, include:
  1. Your resume
  2. What you want to get out of advising
  3. A cool research idea you have in a couple sentences
Some more details on how these office hours will work:
  1. Book weekly or bi-weekly Google Meet [2] calls to check up on your work and ask questions, with 15 minute time slots scheduled via Google Calendar.
  2. The point of these office hours is not to answer "how do I get a job at Google Research", but to fulfill an advisor-like role in lieu of a PhD program. If you are farther along your research career we can discuss career paths and opportunities a little bit, but mostly I just want to help people with (1).
  3. I'm probably not going to write code or run experiments for you.
  4. I don't want to be that PI that slaps their name on all of their student's work - most advice I give will be given freely with no strings attached. If I make a significant contribution to your work or spend > O(10) hours working with you towards a publishable result, I may request being a co-author on a publication.
  5. I reserve the right to decline meetings if I feel that it is not a productive use of my time or if other priorities take hold.
  6. I cannot tell you about unpublished work that I'm working on at Google or any Google-confidential information.
  7. I'm not offering ML consultation for businesses, so your research work has to be unrelated to your job.
  8. To re-iterate point number 2 once more, I'm less interested in giving career advice and more interested in teaching you how to design experiments, how to cite and write papers, and communicating research effectively.
What do I get out of this? First, I get to expand my network. Second, I can only personally run so many experiments by myself so this would help me grow my own research career. Third, I think the supply of mentorship opportunities offered by academia is currently not scalable, and this is a bit of an experiment on my part to see if we can do better. I'd like to give aspiring researchers similar opportunities that I had 4 years ago that allowed me to break into the field.

Footnotes
[1] Chris Olah has a great essay on some additional options and pros and cons of non-traditional education.
[2] Zoom complies with Chinese censorship requests, so as a statement of protest I avoid using Zoom when possible.


Wednesday, April 1, 2020

Three Questions that Keep Me Up at Night

A Google interview candidate recently asked me: "What are three big science questions that keep you up at night?" This was a great question because one's answer reveals so much about one's intellectual interests - here are mine:

Q1: Can we imitate "thinking" from only observing behavior? 

Suppose you have a large fleet of autonomous vehicles with human operators driving them around diverse road conditions. We can observe the decisions made by the human, and attempt to use imitation learning algorithms to map robot observations to the steering decisions that the human would take.

However, we can't observe what the homunculus is thinking directly. Humans read road text and other signage to interpret what they should and should not do. Humans plan more carefully when doing tricky maneuvers (parallel parking). Humans feel rage and drowsiness and translate those feelings into behavior.

Let's suppose we have a large car fleet and our dataset is so massive and perpetually growing that we cannot train it faster than we are collecting new data. If we train a powerful black-box function approximator to learn the mapping from robot observation to human behavior [1], and we use active-learning techniques like DAgger to combat false negatives, will that be enough to acquire these latent information processing capabilities? Can the car learn to think like a human, and how much?

Inferring low-dimensional unobserved states from behavior is a well-studied technique in statistical modeling. In recent years, meta-reinforcement learning algorithms have increased the capability of agents to change their behavior in the presence of new information. However, no one has applied this principle to the scale and complexity of "human-level thinking and reasoning variables". If we use basic black-box function approximators (ConvNets, ResNets, Transformers, etc.), will it be enough? Or will it still fail even with a million lifetimes worth of driving data?

In other words, can simply predicting human behavior lead to a model that can learn to think like a human?

The Self Illusion and Psychotherapy | Psychology Today

One cannot draw a hard line between "thinking" and "pattern matching", but loosely speaking I'd want to see such learned latent variables reflect basic deductive and inductive reasoning capabilities. For example, a logical proposition formulated as a steering problem: "Turn left if it is raining; right otherwise".

This could also be addressed via other high-data environments:

  • Observing trader orders on markets and seeing if we can recover the trader's deductive reasoning and beliefs about the future. See if we can observe rational thought (if not rational behavior).
  • Recovering intent and emotions and desire from social network activity.

Q2: What is the computationally cheapest "organic building block" of an Artificial Life simulation that could lead to human-level AGI?

Many AI researchers, myself included, believe that competitive survival of "living organisms" is the only true way to implement general intelligence.

If you lack some mental power like deductive reasoning, another agent might exploit the reality to its advantage to out-compete you for resources.

If you don't know how to grasp an object, you can't bring food to your mouth. Intelligence is not merely a byproduct of survival; I would even argue that it is Life and Death itself from which all semantic meaning we perceive in the world arises (the difference between a "stable grasp" and an "unstable grasp").

How does one realize an A-Life research agenda? It would be prohibitively expensive to implement large-scale evolution with real robots, because we don't know how to get robots to self-replicate as living organisms do. We could use synthetic biology technology, but we don't know how to write complex software for cells yet and even if we could, it would probably take billions of years for cells to evolve into big brains. A less messy compromise is to implement A-Life in silico and evolve thinking critters in there.

We'd want the simulation to be fast enough to simulate armies of critters. Warfare was a great driver of innovation. We also want the simulation to be rich and open-ended enough to allow for ecological niches and tradeoffs between mental and physical adaptations (a hand learning to grasp objects).

Therein lies the big question: if the goal is to replicate the billions of years of evolutionary progress leading up to where we are today, what are the basic pieces of the environment that would be just good enough?

  • Chemistry? Cells? Ribosomes? I certainly hope not.
  • How do nutrient cycles work? Resources need to be recycled from land to critters and back for there to be ecological change.
  • Is the discovery of fire important for evolutionary progression of intelligence? If so, do we need to simulate heat?
  • What about sound and acoustic waves?
  • Is a rigid-body simulation of MuJoCo humanoids enough? Probably not, if articulated hands end up being crucial.
  • Is Minecraft enough? 
  • Does the mental substrate need to be embodied in the environment and subject to the physical laws of the reality? Our brains certainly are, but it would be bad if we had to simulate neural networks in MuJoCo.
  • Is conservation of energy important? If we are not careful, it can be possible through evolution for agents to harvest free energy from their environment.

In the short story Crystal Nights by Greg Egan, simulated "Crabs" are built up of organic blocks that they steal from other Crabs. Crabs "reproduce" by assembling a new crab out of parts, like LEGO. But the short story left me wanting for more implementation details...

Listen to a ghost crab frighten away enemies—with its stomach ...


Q3: Loschmidt's Paradox and What Gives Rise to Time?

I recently read The Order of Time by Carlo Rovelli and being a complete Physics newbie, finished the book feeling more confused and mystified than when I had started.

The second law of thermodynamics, $\Delta{S} > 0$, states that entropy increases with time. That is the only physical law that is requires time "flow" forwards; all other physical laws have Time-Symmetry: they hold even if time was flowing backwards. In other words, T-Symmetry in a physical system implies conservation of entropy.

Microscopic phenomena (laws of mechanics on position, acceleration, force, electric field, Maxwell's equations) exhibit T-Symmetry. Macroscopic phenomena (gases dispersing in a room, people going about their lives), on the other hand, are T-Asymmetric. It is perhaps an adaptation to macroscopic reality being T-Asymmetric that our conscious experience itself has evolved to become aware of time passing. Perhaps bacteria do not need to know about time...

But if macroscopic phenomena are comprised of nothing more than countless microscopic phenomena, where the heck does entropy really come from?

Upon further Googling, I learned that this question is known as Loschmidt's Paradox. One resolution that I'm partially satisfied with is to consider that if we take all microscopic collisions to be driven by QM, then there really is no such thing as "T-symmetric" interactions, and thus microscopic interactions are actually T-asymmetric. A lot of the math becomes simpler to analyze if we consider a single pair of particles obeying randomized dynamics (whereas in Statistical Mechanics we are only allowed to assume that about a population of particles).

Even if we accept that macroscopic time originates from a microscopic equivalent of entropy, this still begs the question of what the origin of microscopic entropy (time) is.

Unfortunately, many words in English do not help to divorce my subjective, casual understanding of time from a more precise, formal understanding. Whenever I think of microscopic phenomena somehow "causing" macroscopic phenomena or the cause of time (entropy) "increasing", my head gets thrown for a loop. So much T-asymmetry is baked into our language!

I'd love to know of resources to gain a complete understanding of what we know and don't know, and perhaps a new language to think about Causality from a physics perspective



If you have thoughts on these questions, or want to share your own big science questions that keep you up at night, let me know in the comments or on Twitter! #3sciencequestions