Notes From Invest Like the Best: Brian Christian

Link: http://investorfieldguide.com/christian/

About Brian: Author covering humans’ relationship with technology and AI


Q: What advice would you give to people, building careers. We’re in a political cycle now where things like basic income are being discussed. In your view, what are the most defensible areas of human activity, whether that’s some sort of creativity or asking great questions coming up with the objective functions that you then feed the machines? What would you recommend people focus on as they think about either early or late in their career, adding value?

A: There are sort of two ways that I can approach this question. My second book is called the Algorithms to Live By and it looks at things like career decisions from an explicitly algorithmic perspective.

1) Explore/Exploit Trade-off

Description

There’s this paradigm, called the “explore/exploit” trade-off, which is: How much of your energy do you spend gathering information vs how much do you spend committing based on the information? There’s a number of decisions that we face throughout life, that take the form of a tension or a balance between trying new things and committing to the things that seem to be the best. Where to go out to eat, go to our favorite restaurant and we try a new restaurant. Reach out to a new acquaintance we’d like to get to know better or spend time with our close family or best friend. The same thing is true in investing, the same thing is true in managing your time and your career.

Generalizing the Problem

The structure of this problem is an iterated decision that you get to make over and over again. Do you continue to put energy into the things that seem promising, or do you spend your energy trying new things? A clinical trial can have that same structure, and indeed the FDA has been increasingly interested in looking over the disciplinary fence at the computer scientists and saying, maybe those algorithms that you’re using to optimize ads, could also be used to optimize human lives. The way a computer scientist, approaches this question is through something that’s called the multi-armed bandit problem.

The Multi-armed Bandit Problem

Background

In the multi-armed bandit problem you walk into a casino that has all these different slot machines. Some of them pay out with a higher probability than others, but you don’t know which are which. What strategy do you employ to try to make as much money in the casino as you can. It’s going to necessarily involve some amount of exploration trying out different machines to see which ones appear to pay out more than others, and exploitation, which to a computer scientist doesn’t have the negative connotation that it has you know in regular English exploitation meaning, but just leveraging the information you’ve gained so far to crank away on those machines that do seem to be the best. Intuitively I think most of us would recognize that you need to do some amount of both, but it’s not totally obvious what that balance should look like in practice, and indeed for much of the 20th century, this was considered not only an unsolved problem but an unsolvable problem, and sort of career suicide to think about. During WWII, the British mathematicians joked about dropping the multi armed bandit problem over Germany in the ultimate intellectual sabotage. Just waste the brainpower and nerd snipe all of the German mathematicians. To the field’s own surprise, there came a series of breakthroughs on the multi-armed bandit problem through the second half of the 20th century.

Solution

Now we have a pretty good idea of what exact solutions look like given a number of constraints, but also what sort of more general flexible algorithms look like. The critical insight into thinking about this problem is that your strategy should depend entirely on how long you plan to be in the casino. If you feel that you have a long time ahead of you, then it’s worth it to invest in exploration, because if you do find something great, it has a long horizon to pay out. On the other hand, if you feel that you are about to leave the casino, then the return that you would get on making a great new discovery is going to be much smaller, because you have fewer opportunities to crank away on that handle once you find it. We should naturally transition from being more exploratory at the beginning of a process to more exploitative at the end. I think that’s an intuition that makes sense, but the math bears that out very concretely.

Observation of “Explore/Exploit” Trade-Off in Real Life

Psychology

It’s interesting to see this idea that emerges in computer science in the late 50s through the 70s getting picked up by psychologists and cognitive scientists who are interested in human decision making. For example, Alison Gopnik at UC Berkeley who studies infant cognition, has been thinking about the “explore/exploit” trade-off as a framework for how the infant mind works. If you think about how children behave, we have all these stereotypes about children are just kind of random, they’re generally incompetent at things, and there’s a huge literature that shows that they have what’s called a “novelty bias”. They’re relentlessly interested in the next thing and the next thing and the next thing. Rather than viewing that as a kind of low willpower or attentional control issue, you can view it as the optimal strategy. It’s as if you’ve just burst through the doors of life’s casino and you have 80 years ahead of you. It really does make a lot of sense to just run around wildly pulling handles at random. The same is true for being in the later years of one’s life. We have a lot of stereotypes about older people being set in their ways and resistant to change. There’s a psychology literature that shows that older adults, maintain fewer social connections than younger people, and it’s tempting to view that pessimistically. In fact if you build an argument from the mathematics, you can see that older adults are simply in the exploit phase of their life and they are again doing the optimal thing, given where they are in that interval of time. You have psychologists like Stanford’s Laura Carstensen appealing to the “explore/exploit” trade off to make this argument that older adults know exactly what they’re doing and they’re very rationally choosing a strategy that makes sense given where they are. They have a lifetime’s exploration behind them, they know what they really like, they know the people and the connections that matter to them, and they have a finite amount of time left to reap the fruits of some new connection or new discoveries so they’re very deliberately enacting the strategy. The math should predict that, on average, older adults are happier than young people. Despite our preconceptions, and her research bears this out, that appears to be the case.

Business

In business, the problem is very dynamic, which will classify it in the domain of the “restless bandit problem”. Since the research here is cloudier, researchers can invert the thinking to infer the conditions that lead to the business strategies we can observe.

Q: Interesting how this maps on to the life cycles of businesses. In the business context, “explore” might be innovation and “exploit” might be to run the same playbook to earn high returns on capital or something you know works. It seems like you always want to be handing off to a next batch of exploration or innovation, while thoughtfully maintaining something that you know works if you want to survive for very long time.

A: There’s a couple of things that I think are interesting in a business context. One is that implicitly the casino framing that I’ve described assumes that those probabilities are stable and fixed. Of course, we know that the world is not stable and not fixed that things change over time. This is true in our personal lives as well. Your favorite restaurant gets a new line cook and the burgers are not as good. These things shift. This is known as the “restless bandit problem”. How do you play this game when these probabilities are drifting on a random walk?

This is a very interesting case where the theory is not yet consolidated but humans, in practice, seem to have no problem. If you put people in a lab and give them a restless bandit problem, they have no trouble making choices within that environment but we don’t yet know what the mathematics of the optimal solution looks like. So here’s the case where the computer scientists and the mathematicians are asking the cognitive scientists, what are your models for how humans are actually approaching this because there may be some insight that we can use from the theory side. One of the implications of thinking in this way that is particularly relevant in a business setting is if the interval of time you perceive yourself to be on determines the strategy that you should employ, then it should be the case that if you observe someone else’s strategy, you can infer the interval that they’re optimizing over.

Inferring The Explore/Exploit Strategy in a Restless Bandit Problem

Let’s give an example from Hollywood. Most people have noticed, it feels like we’re living through this deluge of sequels, such as Marvel movies. It turns out that this is objectively true. There’s a sea change in Hollywood. In 1982, 2 of the top 10 grossing films were sequels. By 1990 it was six. By the year 2000, it was eight, and I think most recently it was all ten. From that, we can infer that Hollywood has taken a very hard turn towards an exploitative strategy. They are milking their existing franchises, rather than investing money speculatively to try to develop new franchises that will last them into the next few decades. From that, it’s reasonable to infer that movie ticket sales are declining, which turns out to be the case. Hollywood correctly perceives itself to be at the waning time of the golden era of cinema-going. If that’s true, then they really should invest all of their money into just squeezing everything they can out of the existing franchises. More broadly, so you can look at different industries and different corporations to see if they cut their r&d budget. If they’ve given that money to marketing that’d be an indication that they feel that the area has matured or plateaued.

My thoughts

    1. Ahem, asset management, cough
    2. Reminds me of a great Peter Chernin interview where he suggests that every business must be trying to grow new opportunities faster than the the old ones die out. While you must do your best to milk the old, it’s imperative to develop the new.

2) Predicting the Impact of Automation

The second avenue is totally different from this way of thinking, which is just what will the impacts of something like AI or UBI be on the economy. I’m reminded of a McKinsey report on which jobs they thought would be the most robust. The big picture thing that was interesting to me is that it cuts across the traditional class lines. It is not a white-collar versus blue-collar thing. It’s not an upper middle class versus lower middle class thing. It’s very sector dependent. The most resilient or robust jobs at the top end was gardener, legislator, and psychotherapist. I thought that was very fascinating that it’s this eclectic mixture of things. I don’t think of myself as a prognosticator about these sorts of things but my way of thinking about it is that there’s a lot of kind of human machinery around how capital moves and how laws get made. How licensing and permitting happen. It’s still done at a human negotiation level. “I know a guy. I’ll talk to Joe and we’ll sort it out.” I think humans will maintain oversight of these kind of flows of power and capital, even if the actual value is being created by software. So position yourself closer to the flow of that value than the actual creation of the value, which may be counterintuitive.

As far as the question of UBI, I don’t have a great intuition for that. There is already a restlessness in the labor force. A lot of the careers that employ some of the most numbers of people are the most vulnerable. People who drive cars or trucks, people who work in warehouses. A lot of those jobs are just one innovation away, and it’s not clear to me that there’s going to be a political response as well as just a pure economic response. I grew up in New Jersey where there was a robust toll collector union yet they had machines where you could toss your change in a bin and it would automatically sort your change and give you whatever you needed back from that. There was an effective effort to unionize the toll collectors so that you still had a human being in the booth counting out your quarters. That’s an example where it’s not for lack of technology. We had a coin sorting machine, but there was a political process that was directing the actual level of implementation. People will fight to use licensing requirements and regulations to maintain those things. Despite the actual technological capability having radically changed, it’s very hard to know which areas will look shockingly different than the world looks today. Which things will be in some ways shockingly backwards for their time because we’ve had for political reasons to hold the line.

(Reminds me of how rent flows to the owner of a relationship in a competitive market that has been flattened by technology)

Algorithms to make other types of decisions

The mathematics is very instructive, both in a specific way but also has a broader set of principles.

Optimal Stopping Problem

Difference from “explore/exploit” trade-off

One thing that comes to mind is the idea called “optimal stopping”. The multi-armed bandit problem in the “explore/exploit trade off” presumes framing that’s highly iterative. You can pull the handles again and again and again. You can go from one machine to another and back. There are many decisions in life where you are forced to make a single binding commitment that could be anything as banal as pulling into a parking space. It could be something like purchasing a house or signing a lease. It could be something like marrying your spouse. There’s a separate mathematics of cases where you need to find the right moment in time to go all-in, commit to an option, and no longer gather any further information.

37% Rule

There’s this very famous result called the “37% rule”. Let’s say you’re looking for an apartment. And it’s a really competitive marketplace. You’re in a situation where you encounter a series of options one by one. And at each point in time, you must either immediately commit, and then never know what else might have been out there, or decide to walk away and keep exploring your options but lose that opportunity forever. What do you do to try to end up with the best thing possible, even though you, you won’t necessarily know at the time, whether you found the best option that might be out there? There’s this beautifully elegant result that says that you should spend the first 37% of your search non-committally exploring your options. Don’t bring your checkbook, don’t commit to anything No matter how good it seems you’re just purely setting a baseline. After that 37%, whether it’s 37% of the time that you’ve given yourself to make the decision or 37% of the way through the pool of options, be prepared to immediately commit to the very first thing you see that’s better than what you saw in that first 37%. This is not just an intuitively satisfying balance between looking and leaping, this is the mathematically optimal result.

Broader insights on algorithms

Elegant solutions under a range of narrow assumptions about goals and acceptable risks

There are strategies like that that I think are wonderfully crisp in the recommendation they give, but they, of course, rest on this bed of many different assumptions about exactly how the problem is structured and exactly what your goals are. This rule, presumes that your entire goal is to maximize the chance that you get the very best thing in the entire pool, but it comes with a 37% chance of course that you have nothing at all, because you’ve passed. Many people would find that unacceptable. We can go down the rabbit hole of how do you modify this and the solutions get less and less clean as you wiggle the assumptions around.

Intuition for how complex decision-making is can be strangely comforting

More broadly, one of the highest level takeaways for me, from working on the book and just thinking in computational terms about decisions in my own life, is some decisions are just hard. The classical optimal stopping problem, due to a weird mathematical symmetry, is that if you follow the 37% rule you will only succeed 37% of the time. The other 63% of the time you’ll fail, and that is the best possible strategy you could enact in that situation. In a weird way, that’s some measure of consolation because often, in real life, we find ourselves not getting the outcome we wanted. While we can rake ourselves over the coals or try to reconstruct our entire thought process, I think it’s some comfort that computer science and mathematics can, in effect, certify that you were just up against a hard problem. There is some measure of comfort that if you have the kind of the vocabulary to understand the type of problem that you’re facing, and you have some intuitions about the general shape of what optimal solutions look like, then even when you don’t get the outcome that you wanted you can in some sense rest easy because you knew that you followed the appropriate procedure or the appropriate process for dealing with that situation.

Leave a Reply