Exploration vs. Exploitation: how to balance short-term results with long-term impact

Edmund Beggs and Frazer Mawson

Today in 2024, pretty much every functional experimentation team understands the importance of iteration:

  1. You run a test;
  2. You analyze the results to work out why the test did or didn’t work;
  3. You build another test to exploit this learning;
  4. You analyze this second test to work out why it did or didn’t work;
  5. You build another test to exploit this learning;
  6. etc.

This approach can yield some very strong results in the short-to-medium term, but in the longer-term, you’re likely to find that the wins from your chosen line of testing begin to dry up.

From here, it’s only a matter of time until you encounter the most feared phenomenon in all of optimization:

The plateau.

We’ve had a lot of experience helping in-house teams push through this kind of performance plateau, and in our experience, it’s almost caused by the same thing:

A less than optimal approach to the explore-exploit tradeoff.

In this article, we’re going to explain what the explore-exploit tradeoff is, how we’re using our Levers™ Framework to optimally solve it for our clients – and how you too can do the same for your program.

Contents:

  1. What is the explore-exploit tradeoff?
  2. What is the Levers™ Framework?
  3. Using the Levers Framework to balance exploration and exploitation

 

1. What is the explore-exploit tradeoff?

In essence, the explore-exploit tradeoff is the tradeoff between gathering new information (exploration) and using that information to improve performance (exploitation).

When you’re exploring new information, you’re not exploiting the information you already have to drive impact now.

When you’re exploiting preexising information, you’re not gathering new information that might drive an even bigger impact in the future.

As it turns out, the explore-exploit tradeoff shows up in a ridiculously broad range of contexts. For example,

We’re not saying that we’ve found a global solution to the explore-exploit dilemma – one that will apply to all of the various domains mentioned above.

What we are saying, though, is that we believe we’ve found a strong, close-to-optimal solution to this problem within the context of CRO/experimentation – one that will ultimately allow you to move away from the current local maximum (plateau) you are stuck on, and towards your global maximum.

 

Global maximum diagram

 

Central to this approach is our Levers™ Framework.

2. What is the Levers™ Framework?

We’re not going to delve too deeply into our Levers™ Framework here, since we’ve got other pieces of content that fulfill this purpose already – see our recent white paper, webinar, and blog.

Saying that, our entire solution to the explore-exploit tradeoff is built around the Levers Framework, so it’s worth offering up a quick high-level overview before we go any further. If you’re already acquainted with this stuff, feel free to skip ahead.

So, to begin: in order to explain what the Levers™ Framework is, we first need to define what we mean by the word ‘lever’.

For us, a lever is any feature of the user experience that influences user behavior.

For instance, sales countdown timers exploit a sense of urgency. Within the Levers Framework, an experiment that deploys a sales countdown timer would therefore be categorized under the urgency lever, since this is the means by which it influences user behavior.

In essence, the Levers Framework is a comprehensive taxonomy of the user experience features that influence user behavior (see below). The framework is a treelike structure that aims to categorize these features of user experience at three levels of generality: Master Levers (most general); Levers (middle layer); and Sub-levers (most specific).

Levers Framework overview

High-level overview of our Levers Framework.

In principle, this means that every experiment we run – every lever we pull – can be categorized at three different levels of generality.

For example, let’s say we’ve added a Trustpilot Logo to a landing page of one of our clients. By adding this logo we are:

The Levers Framework is the product of more than 16 years worth of iterations, and it has been validated both by its efficacy in our day-to-day client work and by its profound predictive power.

The framework has a huge range of applications that we won’t touch on here (check out the white paper for more), but one thing worth flagging is that it serves as a fine-grained, comprehensive map of the various user experience solutions that influence conversion…

…and once you have a trustworthy map, exploring the territory becomes a whole lot easier.

3. Using the Levers Framework to balance exploration and exploitation

On the surface, our approach to the explore-exploit tradeoff is quite simple:

When we first start working on a website, we make a conscious effort to run exploratory experiments on all 5 of our Master Levers, i.e. Cost, Trust, Motivation, Usability, and Comprehension.

While this approach means that our initial win-rate may be a little bit lower than it would have been had we focussed solely on low hanging fruit, it allows us to gather valuable information about the kinds of interventions that are likely to be most effective on any given website (and those that aren’t).

Graph showing the win-rate of programs focussed on short-term wins vs. teams programs with a structured approach

Programs that pursue quick-wins tend to show lower win-rates in the the long-term than programs focussed on balancing exploration and exploitation in a structured way.

By intentionally collecting information about a broad selection of levers, we are then able to explore the full range of possible solutions in a structured way.

Once we’re confident in our results, we can finally shift into exploit mode and start ruthlessly folding poorly performing levers while doubling down on successful ones to drive maximum impact for our clients.

Putting this in terms of the Local/Global Maximum analogy above:

At the start of a program, we will essentially fly over the entire optimization landscape in search of the region within which the Global Maximum exists.

Once we think we’ve found it, we’ll then drop down into this general region and begin performing a ‘hill-climbing’ operation, which basically involves iteratively improving the website to gradually ascend to the global maximum.

Of course, the reality of the situation is a good deal more complicated than this theoretical sketch may suggest, but we’ve found that the principle behind this approach is sound – and that it often provides an optimal path through the explore-exploit quandary, allowing us to maximize long-term value for our clients.

To support you in actually applying this approach to your own work, we’re going to run through the key steps in our process, with the goal of adding some additional detail and actionability to the picture painted thus far.

1. Research

As mentioned above, our approach involves distributing our experiments across all 5 Master Levers.

However, before we do this, we first need to identify the most impactful levers within those 5 Master Levers, as well as the types of experiments that are most likely to succeed for each.

We therefore typically begin by running a UX research project, which will include methodologies like analytics reviews, user testing, surveys, scroll/heatmaps, competitor analysis, and more.

This allows us to collect a huge range of observations about the barriers and motivations that are active on any given website.

We will then combine sets of these individual observations into what we term ‘insights’, which are the unifying themes under which observations can be grouped.

So, to give a more concrete example:

Once we’ve combined all of our observations into insights, we start assigning each insight to a Master Lever, Lever, and Sub Lever, working our way down our framework to establish an increasingly specific understanding of the problem we’re trying to address.

Observations to insights to levers

Different research methods generate observations, which we cluster together under different themes known as insights. We then aim to map each of these insights to the lever that relates most closely to it.

So, returning to the example from above: if our insight is ‘users lack trust in the efficacy of the service,’ this is clearly a trust issue, so we will assign this insight to the Trust Master Lever.

Imageistic representation of the 'trust' master lever

The Trust Master Lever, along with its constituent Levers and Sub-Levers

Moving one layer further down the framework, we must then ask: ‘is this a legitimacy question, a credibility question, or a security question?’

In our framework, Credibility is about whether a company is able to live up to the claims that it makes on its website, so clearly this is a Credibility question rather than one pertaining to Security or Legitimacy.

Interpreting the Sub Lever may be slightly more difficult, but for now, we may tentatively identify this as an Authority issue, since an increase in Authority would likely assuage the trust-related concerns associated with this insight.

Using this approach, we will attempt to tag every insight we’ve collected from our research to a specific Master Lever, Lever, and Sub Lever.

This will then typically leave us with a huge range of insights, distributed across all 5 of our master levers (see graph below).

Insights vs. Master Levers

Insights from one of our programs, distributed across our 5 Master Levers.

2. Ideation

Now that you have each of your insights assigned to a lever, you’ll need to develop the execution for each of these specific levers.

To clarify, let’s once again return to the example from the previous section, where we found that the Master Lever was Trust, the Lever was Credibility, and the Sub Lever was Authority.

So far, we have quite a specific idea about what we might want to test, i.e. anything that is going to enhance the authority of our brand. This solution, however, still leaves us with some scope with respect to the actual experiment we want to run.

For example,

We have an entire process for developing high-impact experiment concepts from our initial research – we’ll be sharing more info about this in the future – but for now, here are a few considerations to keep in mind during this initial ideation session:

3. Roadmap strategy

Once you’ve developed executions for each of your initial concepts, you’ll then need to prioritize these executions.

We’ve developed a machine learning assisted prioritization tool for this purpose, but feel free to apply whatever prioritization framework you’re currently using.

The goal is to end up with a relatively long list of experiment executions, prioritized based on things like:

Once you’ve got this prioritized list together, you then need to build your roadmap.

This is the stage where you get to exert more intentional control over your balance of exploration vs. exploitation.

We would always recommend testing across all 5 Master Levers, but perhaps you want to give a slightly stronger weight to exploitation rather than exploration. In this case, for the first 20 experiments in your roadmap, you may choose to run 8 on the Master Lever with the most insights attached to it and only 1 or 2 to the Master Lever with the least.

Conversely, if you want to ensure that your initial exploration is as thorough as possible, you will need to make sure that your tests are fairly evenly diversified across all 5 Master Levers so as to gather as much information about their respective efficacies as possible.

Saying that, we do not recommend running tests without any supporting research. If one of your Master Levers has few or no insights attached to it, we would recommend shifting your attention to the other Master Levers.

CRO is often a balance between earning and learning; by intentionally weighing your balance of exploration to exploitation at this step, you can ensure that your roadmap aligns as closely with your goals as possible.

4. Experiment

Once you’ve decided which tests you want to run, the next thing to do is actually run them!

5. Iterate

You may think that once you’ve run your initial tests, you’re ready to start folding losers and doubling down on winners to drive value now.

For winning tests, this is more or less how it works. When you find an effective lever, our recommendation is that you exploit that lever relentlessly, for as long as it continues to deliver value.

With one of our clients, we’ve run 46 iterations on a single lever – and it still delivers results to this day!

For losing experiments, on the other hand, there are some additional considerations worth factoring in. As a pretty reliable rule of thumb, a test will lose for one of two reasons:

  1. Execution – the lever you selected may actually be effective, but the execution you chose may be poor.
  2. Lever – the specific levers you’ve tested are themselves ineffective. This reason breaks down into two further types:
    • The specific Master Levers and Levers you’ve tested on are ineffective.
    • The specific Sub Lever you’ve chosen to test is ineffective, but the general formulation of your problem re: Master Levers and Levers is correct.

We’ve previously written in detail about our process for diagnosing the cause of a test’s loss, as well as how to iterate on this type of result. This blog post here is already pretty long, so we won’t go into this again now, but if you’d like to read about this subject in detail, click here.

One important thing to keep in mind is that losing experiments are not ‘the end of the line’ for a lever. In fact, they often tell you a lot more than inconclusive experiments – and therefore provide direction for future testing.

Ultimately, while no single non-winning experiment is sufficient to rule out a lever’s importance, losing experiments have the advantage of telling you something more: that what you are doing at least matters to users.

This suggests that a better message or change relevant to that lever might well intervene in a way that makes a positive rather than negative difference. Equally, simply doing less of – or the opposite of – what had the negative effect might be most effective.

Final thoughts

In our experience, a sub-optimal balance between exploration and exploitation is the cause of 9 out of 10 performance plateaus. In this blog, we shared the approach we’ve been using to help our clients successfully navigate the explore-exploit tradeoff and begin once again driving revenue with CRO/experimentation.

As will be clear by now, much of this approach is driven by our Levers Framework, so if you’re keen to put the method laid out here into practice, we’d recommend that you download our recent white paper. This should hopefully give you everything you need to get started.

And in the meantime, if you’ve got any further questions about how any of this works, feel free to drop us a line – we’re passionate about experimentation and are always keen to share our expertise where we can!

Join 5,000 other people who get our newsletter updates