
(Photo: Mike Powell / Staff / Getty)
The classic example of the so-called “explore-exploit dilemma” is ordering food in a restaurant. You’ve had the pad thai a bunch of times and know it’s good; you’ve heard the pad krapow is amazing but have never tried it. Should you stick with the familiar or venture into the unknown? It’s a familiar dilemma, and one that crops up all the time in various guises: what TV show to watch, who to date, which job offer to consider, and so on.
Athletes, too, are constantly having to navigate explore-exploit dilemmas, according to a new paper in the journal Psychology of Sport and Exercise, from Katja Rewitz and two colleagues at the University of Hamburg. Sports-related trade-offs range from the most general (choosing which sport to compete in) to the highly specific (deciding which brand of gels to use during a race). But how do we know if we’re clinging too tightly to tradition, or leaping too enthusiastically into each new fad?
As it happens, there’s a vast scientific literature on explore-exploit decision-making, drawing on subjects like behavioral economics, evolutionary biology, and computer science. Rewitz pulls out three key types of explore-exploit dilemmas that you might face as an athlete:
Rewitz’s example is that kids try a lot of different sports, but eventually have to choose one in order to maximize their potential. I’d argue the same is true, to a certain extent, for adults: I love running, but also enjoy basketball and have dabbled in climbing and am curious about cycling. As much as I’d love to pursue all of them in parallel, I have finite time and energy, so I have to prioritize—though my goal at this point is to maximize overall enjoyment and health more than performance.
The scientific category that addresses problems like this is information search, and the classic example is what was called (when it was first studied in the 1950s) the Secretary Problem. You’re interviewing candidates for a job, and you have to assess each one and rank them relative to the previous interviewees. The question is: At what point do you decide a candidate is good enough to hire, rather than continuing to interview more candidates?
Under certain fairly artificial conditions (e.g. if you assume that you can only hire someone right after interviewing them, and can’t subsequently go back to hire earlier candidates), you can come up with a mathematical rule. Of the total number of candidates available, you should interview 37 percent of them without hiring anyone, in order to get a sense of how good the overall candidate pool is. After that, you should hire the first candidate who is better than anyone you’ve interviewed so far.
The specific numbers here aren’t useful because the constraints are too unrealistic. But the general approach makes sense: you should first explore your options, and only permit yourself to lock in choices after the initial period of exploration. As it happens, this maps well onto the findings of scientists who study talent development: adult elite athletes typically played more different sports as kids and specialized later than elite juniors who didn’t make it to the next level. Explore then exploit.
The example here is switching from an established routine to a new coach or a new training program, though it could equally well apply to topics like supplements (stick with beet juice, or switch to baking soda or broccoli extract?) and nutrition (double your carb intake?). In my book The Explorer’s Gene, the main example I used in my chapter on the explore-exploit dilemma was the speedskater Nils van der Poel, whose amazingly unorthodox training approach leading up to the 2022 Olympics shook the endurance world.
The analogous field of scientific study is animal foraging. If you’re a bee sucking nectar from a patch of flowers, you eventually have to decide when to abandon the current patch of flowers and fly off into the great unknown world in search of a better patch. There’s no guarantee that this hypothetical other patch will be better than your current patch, and you’ll have to spend valuable energy to get there, so the decision to move on is a leap of faith.
The mathematical approach to foraging (which does indeed seem to correspond with how animals in the wild make foraging decisions) is called Marginal Value Theorem. The bee has a rough estimate of how much pollen is available in a “typical” flower patch, so it sticks with its current patch until pollen levels drop below that average level, then moves on.
A crucial point here is that the current flower patch suffers from diminishing returns, because the bee is gradually using up the patch’s pollen. So it might be a smart decision to stay put for now, but sooner or later you’ll have to move on. You could argue that training philosophies follow similar dynamics. You move to a new coach or adopt a new philosophy (Norwegian training, anyone?). After a period of adjustment, you make rapid progress. But a few years later, you’re no longer improving as quickly, presumably because your body and mind have adapted to the new training program’s stimulus. Is it time to make another switch?
Training philosophies are harder to quantify than pollen levels, but the same pattern applies: sustained periods of exploitation punctuated by brief interludes of exploration.
When you’re in the arena, either literally or metaphorically, the answer to the explore-exploit dilemma is usually “both.” Rewitz uses the example of strategy during a tennis match: If you usually play from the baseline, should you come to the net?
This decision depends on a wide range of factors that are constantly changing in real time. You might gain a temporary advantage by surprising your opponent, but the strategy’s success will depend on how well you’re volleying and whether your opponent can return your volleys. And even if you’re initially successful, your opponent might soon adapt to your net game, tipping the balance in favor of moving back to the baseline. There’s no lasting answer here: it’s a constantly changing mix of exploring and exploiting.
The scientific model for this kind of decision-making is a gambling game called the multiarmed bandit: you have a pocketful of coins, and have to choose between an array of slot machines (“one-armed bandits,” as they’re sometimes called). You don’t know the odds of winning for each machine, so you have to try lots of different machines (i.e. explore) to figure out which one is the best to play (i.e. exploit). But you also have to keep exploring, because the odds for each machine keep changing.
There’s no perfect solution to the most realistic versions of the multiarmed bandit. But there are various shortcuts that help maximize your chances of a good outcome. My favorite is one called the upper confidence bound algorithm, which scientists sometimes explain as “be optimistic in the face of uncertainty.” In essence, it involves choosing the option with the greatest realistic upside.
For example, if you’re confident you can finish in the top five by sitting in the pack, but believe that you have a ten-percent chance of winning the race by making a big surge, the upper confidence bound algorithm says go for it. You might win—and even if you don’t, you’ll have less regret. (Regret, too, is a mathematical quantity here: the difference between how things could have turned out and how they did turn out.)
This last example—racing conservatively versus going for it—is the one that I think most about when I contemplate the explore-exploit dilemma. My biggest unfulfilled dream as a track runner was to run a sub-four-minute mile; my best 1,500-meter time of 3:42.43 is equivalent (according to World Athletics) to a mile in 4:00.03. I can’t count the number of 1,500 and mile races I ran over the course of my career—but strangely, I always stuck with the same start-slow, finish-fast strategy. I never once reached the halfway point fast enough to be on pace for a four-minute mile. Looking back, this omission baffles me. How could I have spent all those years pursuing a goal, and never once explored a fairly obvious approach to achieving it?
I don’t know if reading Rewitz’s paper (with the help of a time machine) might have changed my track career. But I spent a lot of time reading and thinking about the explore-exploit dilemma while writing The Explorer’s Gene, and I came away thinking that it was a useful and underappreciated framework for thinking through training and racing decisions. It won’t necessarily spit out a simple or straightforward answer, but it will get you thinking about the right questions.
For more Sweat Science, sign up for the email newsletter and check out my new book The Explorer’s Gene: Why We Seek Big Challenges, New Flavors, and the Blank Spots on the Map.