Wednesday, November 6, 2019

Robinhood, Leverage, and Lemonade

DISCLAIMER: NO INVESTMENT OR LEGAL ADVICE
The Content is for informational purposes only, you should not construe any such information or other material as legal, tax, investment, financial, or other advice. Investing involves risk, please consult a financial professional before making an investment.


Robinhood is a zero-commission brokerage that was founded in 2013. It has a beautiful mobile user interface that game-ifies the gambling of your life savi—, er, makes it seamless for millennials to buy and sell stocks.

I wrote on Quora in Dec 2014 on why lowering the barrier to entry to this extent can cause retail investors to make trades without knowing what they are doing. That post turned out to be rather prescient, for reasons I’ll explain below.

One of the ways Robinhood makes money is via margin lending: they loan you some extra money to invest in the stock market with, and later you pay back the loan with some interest (currently about 5%).

If you are in the business of lending money, not only do you have to safeguard your brokerage system against technological vulnerabilities (e.g. C++ memory leaks that expose users’ trades), but you also need to defend against financial vulnerabilities, which are portfolios that expose the lender or its customers to an irresponsible amount of investment risk.

In the last few months it has come to light [1, 2, 3, 4, 5] that there are some serious financial vulnerabilities in Robinhood’s margin lending platform, whereby it is possible for users to borrow much, much more money from Robinhood than they are supposed to.


These users subsequently gamble huge amounts of borrowed money away in a coin toss, leaving Robinhood in a very bad spot, perhaps even at odds with Regulation T laws (I am not a lawyer, just speculating here).

“Leverage” is one of the most important concepts to understand in finance, and when used judiciously, is a net positive for everyone involved. It is important for everyone to understand how credit works, and how much leverage is too much. Borrowing more money than you can afford to pay back can take many forms, whether it is taking on college debt, credit card debt, or raising VC money.

Here’s a tutorial on “financial leverage” in the form of a story about lemonade:


Lemonade Leverage


It’s a hot summer, and you decide to start a lemonade stand to make some money. You have 100, with which you can buy enough ingredients to make 120 of lemonade for the summer. Your “return on investment”, or ROI, for the summer is 20%, since you ended up with 20% more money than you started with.

You also figure that if you had another 200, enough people want lemonade that you could sell three times as much lemonade and make 360. But you don’t have 200 to spare! What do you do?

You could use the 120 to build a slightly bigger lemonade operation next year. Assuming you could get a 20% ROI again next summer, you end up with 144. But it will be many years before you even have 300! By this time next year, lemonade might be out of fashion and kids might be juuling at home watching Netflix instead. You would much prefer to scale up your lemonade operation now, while you are confident that you can sell lemonade at a "profit margin" of 20%.

Fortunately, your friend “Britney Banker” is very wealthy and can lend you 200. Britney Banker doesn’t have your entrepreneurial spirit, so she lacks the ability to get a 20% ROI on her own money. She offers to give you 200 today, in exchange for you giving her 210 at the end of the year -- an interest rate of 5%. Your “capital leverage ratio” is 100 / 200 = 1:2, because for every dollar you own, Britney is willing to lend you 2.

If things turn out well, you sell 360 worth of lemonade, pay Britney back 210, and pocket the remaining 150. Starting with 100, you were able to use borrowed money to “magnify” your return to 50%.

However, if you make 200 worth of lemonade and fail to sell any of it before the lemonade spoils and became worthless, you would be in a very sticky situation! You would have worthless lemonade and a 210 debt to Britney. This is far worse than if you had lost your own 100, because at least you wouldn’t owe anyone anything afterwards. So even though 1:2 leverage may amplify your gains from 20% → 50%, so it may amplify your potential losses from 100% → -310%!

The only reason why Britney is willing to lend you the money in the first place is that Britney thinks this outcome (you losing all of the borrowed money on top of your own assets) is unlikely. If Britney thought that you were less reliable, she might offer you a smaller leverage ratio (e.g. 1 : 1.5).

Lemonade Coupons


Suppose you make a big batch of lemonade (with Britney’s money) and then go door to door selling lemonade, but instead of giving customers a delicious drink right away, you give them a “deep-in-the-lemonade covered call option”. You take their money up front, and give them a coupon that allows them to “buy” a lemonade for free (0).

The "call option" is referred to as "covered" because you actually have the lemonade to go with the coupon, it's just that you're holding onto the lemonade until the buyer actually redeems the coupon.


You then go back to Britney and say “I have 360 of lemonade that I’ve made but haven’t sold, and 360 in cash from selling lemonade options to customers, and as for debts there’s 200 I’ve borrowed from you. That’s 520 in net assets, so can I please borrow 1040?”.

Britney says “sure, that’s a 1:2 leverage ratio”, and writes you a check for 1040, again with 5% interest. But Britney has made a tragic mistake here! The 360 in lemonade she counted as your assets are not really yours to spend, because you actually owe them in obligations to customers.

With 1204 in borrowed assets, you are now leveraged over 1:12 !

You repeat this process again, turning 1040 cash into 1248 of lemonade, selling an additional 1248 of deep-in-the-lemonade options. You now have 1608 of lemonade, and 1608 in cash, and 1204 of debt, for net assets of 1608 + 1608 - 1204 = 2012.

You go back to Britney and ask to borrow another 4024€, with 5% interest. Again, because Britney is forgetting to account for the 1608 in lemonade “debt” that you may have to deliver to coupon-holders, she thinks that the leverage is still 1:2. You repeat this process one more time, and your new total position is 6k in lemonade, 6k in cash, 5k net debt.

If you were to successfully deliver 6k of lemonade, you would make 1k in profit, starting from only 100 of your own cash. A 1000% return sounds too good to be true, right? That’s because it is.

One hot summer day, all of the coupon holders decide to exercise their coupons at the same time. You realize that your lemonade stand can’t actually fulfill 6k in lemonade orders and you are in way over your head. Desperate, you attempt to pivot and come up with a Billy Mcfarland-esque scheme to buy lemonade from a local grocery and dilute it with some water. But due to inexperience with food handling operations, you accidentally contaminate half the batch, and are left with only 3k of lemonade. You have 6k cash but still owe 3k in lemonade and 5k in cash.Your 1k profit opportunity has now become a 2k DEBT (ROI of -2100%), and we haven't even factored in the interest! Because the debtors (lemonade coupon holders and Britney Banker) must be paid regardless of whether you successfully make lemonade or not, your leverage has an asymmetric payoff - the downsides are twice as bad as the upside!

I wish I could say that this story was fictional, but to the best of my understanding this is more or less what /u/ControlTheNarrative and others attempted to do on Robinhood. Substitute "lemonade" for "AMD stock", and "lemonade coupon" for "deep-in-the-money covered call option". Theoretically, Robinhood shouldn't allow you to buy options on margin, but /u/ControlTheNarrative was very clever to use covered call options, which meant that he bought AMD stock with margin (valid) and then created cash and in-the-money AMD call options (sort of like creating matter and antimatter from nothing). Robinhood failed to detect the "antimatter", allowing /u/ControlTheNarrative to mask his "debt", thereby doubling his apparent net assets.

Ok, where did /u/ControlTheNarrative go wrong? It might be possible to still turn a profit by investing the vast amount of leverage in a “safe asset”, right? This seems unlikely: Robinhood’s interest rate of 5% far exceeds the risk-free rate of 1.88% currently offered by a 1-year Treasury note. In other words, it only makes sense to use Robinhood's leverage when you have the ability to deliver annualized returns that exceed 5%. When one has limited assets and a risky investment opportunity, they should instead carefully choose leverage so that they do not end up owing 10x their net worth should they encounter a stroke of bad luck.

Instead of trying to find an investment that minimizes risk while maintaining >5% return, /u/ControlTheNarrative proceeded to then take his enormous leverage and bet all of that on a coin toss: out-of-the-money (OTM) put options against Apple (remember that he is able to buy these options with leveraged cash because it has been "laundered" using covered call options).

Unfortunately for him, Apple proceeded to beat performance expectations for earnings, and subsequently the OTM options became worthless!

Guh!

Acknowledgements


Thanks to Ted Xiao and Daniel Ho for insightful discussion. We had a good laugh. I found the following links helpful in my research:





Saturday, July 6, 2019

Normalizing Flows in 100 Lines of JAX

JAX is a great linear algebra + automatic differentiation library for fast experimentation with and teaching machine learning. Here is a lightweight example, in just 75 lines of JAX, of how to implement Real-NVP.

This post is based off of a tutorial on normalizing flows I gave at the ICML workshop on Invertible Neural Nets and Normalizing Flows. I've already written about how to implement your own flows in TensorFlow using TensorFlow Probability's Bijector API, so to make things interesting I wanted to show how to implement Real-NVP a different way.

By the end of this tutorial you'll be able to reproduce this figure of a normalizing flow "bending" samples from a 2D Normal distribution to samples from the "Two Moons" dataset. Real-NVP forms the basis of a lot of flow-based architectures (as of 2019), so this is a good template to start learning from.



If you are not already familiar with flows at a high level, please check out the 2-part tutorial: [part 1] [part 2], as this tutorial just focuses on how to implement flows in JAX. You can find all the code along with the slides for my talk here.

Install Dependencies

There are just a few dependencies required to reproduce this tutorial. We'll be running everything on the CPU, though you can also build the GPU-enabled versions of JAX if you have the requisite hardware.

pip install --upgrade jax jaxlib scikit-learn matplotlib

Toy Dataset


Scikit-Learn comes with some toy datasets that are useful for small scale density models.


from sklearn import cluster, datasets, mixture
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
n_samples = 2000
noisy_moons = datasets.make_moons(n_samples=n_samples, noise=.05)
X, y = noisy_moons
X = StandardScaler().fit_transform(X)

Affine Coupling Layer in JAX


TensorFlow probability defines an object-oriented API for building flows, where a "TransformedDistribution" object is given a base "Distribution" object along with a "Bijector" object that implements the invertible transformation. In pseudocode, it goes something like this:

class TransformedDistribution(Distribution):
  def sample(self):
    x = self.base_distribution.sample()
    return self.bijector.forward(x)
  def log_prob(self, y):
    x = self.bijector.inverse(y)
    ildj = self.bijector.inverse_log_det_jacobian(y)
    return self.base_distribution.log_prob(x) + ildj

However, programming in JAX takes on a functional programming philosophy where functions are stateless and classes are eschewed. That's okay: we can still build a similar API in a functional way. To make everything end-to-end differentiable via JAX's grad() operator, it's convenient to put the parameters that we want gradients for as the first argument of every function. Here are the sample and log_prob implementations of the base distribution.

def sample_n01(N):
  D = 2
  return random.normal(rng, (N, D))
def log_prob_n01(x):
  return np.sum(-np.square(x)/2 - np.log(np.sqrt(2*np.pi)),axis=-1)

Below are the forward and inverse functions of Real-NVP, which operates on minibatches (we could also re-implement this to operate over vectors, and use JAX's vmap operator to auto-batch it). Because we are dealing with 2D data, the masking scheme for Real-NVP is very simple: we just switch the masked variable every other flow via the "flip" parameter.

def nvp_forward(net_params, shift_and_log_scale_fn, x, flip=False):
  d = x.shape[-1]//2
  x1, x2 = x[:, :d], x[:, d:]
  if flip:
    x2, x1 = x1, x2
  shift, log_scale = shift_and_log_scale_fn(net_params, x1)
  y2 = x2*np.exp(log_scale) + shift
  if flip:
    x1, y2 = y2, x1
  y = np.concatenate([x1, y2], axis=-1)
  return y


def nvp_inverse(net_params, shift_and_log_scale_fn, y, flip=False):
  d = y.shape[-1]//2
  y1, y2 = y[:, :d], y[:, d:]
  if flip:
    y1, y2 = y2, y1
  shift, log_scale = shift_and_log_scale_fn(net_params, y1)
  x2 = (y2-shift)*np.exp(-log_scale)
  if flip:
    y1, x2 = x2, y1
  x = np.concatenate([y1, x2], axis=-1)
  return x, log_scale

The "forward" NVP transformation takes in a callable shift_and_log_scale_fn (an arbitrary neural net that takes the masked variables as inputs), applies it to recover the shift and log scale parameters, transforms the un-masked inputs, and then stitches the masked scalar and the transformed scalar back together in the right order. The inverse does the opposite. 

Here are the corresponding sampling (forward) and log-prob (inverse) implementations for a single RealNVP coupling layer. The ILDJ term is computed directly, as it is just the (negative) sum of the log_scale terms.


def sample_nvp(net_params, shift_log_scale_fn, base_sample_fn, N, flip=False):
  x = base_sample_fn(N)
  return nvp_forward(net_params, shift_log_scale_fn, x, flip=flip)

def log_prob_nvp(net_params, shift_log_scale_fn, base_log_prob_fn, y, flip=False):
  x, log_scale = nvp_inverse(net_params, shift_log_scale_fn, y, flip=flip)
  ildj = -np.sum(log_scale, axis=-1)
  return base_log_prob_fn(x) + ildj

What should we use for our shift_and_log_scale_fn? I've found that for 2D data + NVP, wider and shallow neural nets tend to train more stably. We'll use some JAX helper libraries to build a function that initializes the parameters and callable function for a MLP with two hidden layers (512) and ReLU activations. 


from jax.experimental import stax # neural network library
from jax.experimental.stax import Dense, Relu # neural network layers


def init_nvp():
  D = 2
  net_init, net_apply = stax.serial(
    Dense(512), Relu, Dense(512), Relu, Dense(D))
  in_shape = (-1, D//2)
  out_shape, net_params = net_init(rng, in_shape)
  def shift_and_log_scale_fn(net_params, x1):
    s = net_apply(net_params, x1)
    return np.split(s, 2, axis=1)
  return net_params, shift_and_log_scale_fn

Stacking Coupling Layers


TensorFlow Probability's object-oriented API is convenient because it allows us to "stack" multiple TransformedDistributions on top of each other for more expressive - yet tractable - transformations. 


dist1 = TransformedDistribution(base_dist, bijector1)
dist2 = TransformedDistribtution(dist1, bijector2)
dist2.sample() # member variables reference dist1, which references base_dist

For "bipartite" flows like Real-NVP which leave some variables untouched, it is critical to be able to stack multiple flows so that all variables get a chance to be "transformed". 

Here's the functional way to do the same thing in JAX. We have a function "init_nvp_chain" that returns neural net parameters, callable shift_and_log_scale_fns, and masking parameters for each flow. We then pass this big bag of parameters to the sample_nvp_chain function. 

In log_prob_nvp_chain, there is an iteration loop that overrides log_prob_fn, which is initially set to base_log_prob_fn. This is to accomplish similar semantics to how TransformedDistribution.log_prob is defined with respect to the log_prob function of the base distribution beneath it. Python variable binding can be a bit tricky at times, and it's easy to make a mistake here that results in an infinite loop. The solution is to make a function generator (make_lob_prob_fn), that returns a function with the correct base log_prob_fn bound to the log_prob_nvp argument. Thanks to David Bieber for pointing this fix out to me.


def init_nvp_chain(n=2):
  flip = False
  ps, configs = [], []
  for i in range(n):
    p, f = init_nvp()
    ps.append(p), configs.append((f, flip))
    flip = not flip
  return ps, configs

def sample_nvp_chain(ps, configs, base_sample_fn, N):
  x = base_sample_fn(N)
  for p, config in zip(ps, configs):
    shift_log_scale_fn, flip = config
    x = nvp_forward(p, shift_log_scale_fn, x, flip=flip)
  return x

def make_log_prob_fn(p, log_prob_fn, config):
  shift_log_scale_fn, flip = config
  return lambda x: log_prob_nvp(p, shift_log_scale_fn, log_prob_fn, x, flip=flip)

def log_prob_nvp_chain(ps, configs, base_log_prob_fn, y):
  log_prob_fn = base_log_prob_fn
  for p, config in zip(ps, configs):
    log_prob_fn = make_log_prob_fn(p, log_prob_fn, config)
  return log_prob_fn(y)

Training Real-NVP


Finally, we are ready to train this thing! 

We initialize our Real-NVP with 4 affine coupling layers (each variable is transformed twice), define the optimization objective to be model negative log-likelihood over minibatches (more precisely, cross entropy). 


from jax.experimental import optimizers
from jax import jit, grad
import numpy as onp
ps, cs = init_nvp_chain(4)

def loss(params, batch):
  return -np.mean(log_prob_nvp_chain(params, cs, log_prob_n01, batch))
opt_init, opt_update, get_params = optimizers.adam(step_size=1e-4)

Next, we declare a single optimization step where we retrieve the current optimizer state, compute gradients with respect to our big list of Real-NVP parameters, and then update our parameters. The cool thing about JAX is that we can "jit" (just-in-time compile) the step function to a single XLA op so that the entire optimization step happens without returning back to the (relatively slow) Python interpreter. We could even JIT the entire optimization process if we wanted to!

@jit
def step(i, opt_state, batch):
  params = get_params(opt_state)
  g = grad(loss)(params, batch)
  return opt_update(i, g, opt_state)

iters = int(1e4)
data_generator = (X[onp.random.choice(X.shape[0], 100)] for _ in range(iters))
opt_state = opt_init(ps)
for i in range(iters):
  opt_state = step(i, opt_state, next(data_generator))
ps = get_params(opt_state)

Animation


Here's the code snippet that will visualize each of the 4 affine coupling layers transforming samples from the base Normal distribution, in sequence. Is it just me, or does anyone else find themselves constantly having to Google "How to make a Matplotlib animation?"


from matplotlib import animation, rc
from IPython.display import HTML, Image

x = sample_n01(1000)
values = [x]
for p, config in zip(ps, cs):
  shift_log_scale_fn, flip = config
  x = nvp_forward(p, shift_log_scale_fn, x, flip=flip)
  values.append(x)

# First set up the figure, the axis, and the plot element we want to animate
fig, ax = plt.subplots()
ax.set_xlim(xlim)
ax.set_ylim(ylim)

y = values[0]
paths = ax.scatter(y[:, 0], y[:, 1], s=10, color='red')

def animate(i):
  l = i//48
  t = (float(i%48))/48
  y = (1-t)*values[l] + t*values[l+1]
  paths.set_offsets(y)
  return (paths,)
anim = animation.FuncAnimation(fig, animate, frames=48*len(cs), interval=1, blit=False)
anim.save('anim.gif', writer='imagemagick', fps=60)