Building a second-brain: How to unlock the full power of meta-analysis for your experimentation program

Frazer Mawson

We’ve been storing and recording extensive data on every experiment we’ve run for years.

As the world’s largest experimentation agency, this means we now have a huge repository of past a/b test results, spanning countless industries, verticals, and company sizes.

We’ve always known that this repository had the potential to offer an incredible competitive edge to our clients, but up until recently, we’ve had trouble unlocking its full potential.

Recently, though, that’s all changed.

In fact, thanks in large part to our experiment repository and the meta-analytic techniques it’s opened up for us, we’ve been able to achieve some of our most impressive agency-wide results ever.

So what have we done and how did we do it?

In this blog, we’re going to share all.

Whether you run a mature experimentation program or you’re just getting started, this blog will give you everything you need to build an experiment repository and begin unlocking the power of meta-analysis in your own work.

Contents:

What is meta-analysis?

A/b tests – or randomized controlled trials (RCTs) – offer one of the strongest forms of evidence available for understanding causal relationships.

But there’s one form of evidence that’s even stronger than RCTs:

Meta-analyses of RCTs.

Hierarchy of evidence

The hierarchy of evidence

A meta-analysis is essentially an analysis of multiple RCTs with the goal of combining their data to increase sample sizes and unearth macro-level trends that arise across trials.

Think of it like this: even with a strong a/b test, there’s still some chance of error creeping into your results. Maybe a confounding variable has distorted your data. Maybe your result was an unlikely statistical fluke. Maybe it was something else.

By merging multiple a/b test results, you stand a much better chance of separating out the signal from the noise.

Combining the results of multiple a/b tests

Combining the results of multiple a/b tests

Meta-analysis is a very well-utilized tool within academic contexts, but in the world of business experimentation, very few people are doing it – let alone doing it well.

In our view, based on the results we’ve been seeing, this is a huge missed opportunity.

Think what you could do if you could combine every experiment you’ve ever run and extract macro-level insights to help inform things like:

Why bother building an experiment repository? 5 good reasons

Building an experiment repository is the first and most important step on the road to running successful meta-analyses.

Put simply, if your data is effectively stored and organized in a centralized repository, then running high-level meta-analyses will be relatively straightforward.

Unfortunately, building an effective experiment repository can be quite labor intensive, so here are five good reasons, taken from our own experience, of why we think you should invest the time.

1. Improved performance

This one’s the most generic benefit we’re going to talk about, but we thought we best mention it up top because, well – numbers speak. All of the other benefits below ladder up to this one.

During the last few years, we’ve experienced some incredible growth as an agency – but as any growing team will know, the bigger you become, the harder it can be to maintain standards.

We work tirelessly to mitigate this risk, and thanks to our emphasis on good processes and a systematized methodology, we’re generally quite successful at doing so.

But in 2022, when we first began to really operationalize our experiment repository, not only were we able to keep key agency-wide metrics steady – we were able to raise them.

In fact, we hit our highest agency-wide win-rate ever in 2022* – 26% higher than the average from the previous four years – and we’ve remained on the same trajectory ever since.

None of this would have been possible without our experiment repository and the innovations that surround it.

Graph of our win-rate over time

Our agency-wide win-rate from 2018-2022

*In addition to win-rate, which can be a slightly problematic metric at times, we also track things like testing velocity, volume of experiments, etc., and use these as important program- and agency-wide metrics to monitor and optimize our performance over time. We wouldn’t expect our experiment repository to impact these metrics in any real way, which is why the emphasis here is on win-rate alone.

2. Unearth macro-level insights

An effective experiment repository will allow you to unearth macro-level insights that were completely inaccessible to you before.

The value of this capability can’t be overstated.

It allows you to start interrogating your data in completely new ways, meaning you can combine all of your experiment results to ask questions like:

This may sound quite cool in theory, but in practice, it’s a way to unearth novel insights, open up completely new avenues of testing – and ultimately take your experimentation program to the next level.

We’ve seen some of the best program-wide results in our agency’s history by using these kinds of meta-analytic techniques. Here’s a particularly striking example.

3. Improved organizational memory & collaboration

Since really building out our experiment repository, we now have an easily accessible, rigorously tagged, endlessly filterable storehouse of data.

This functionality means it’s easier than ever to harness insights from past experiment programs to drive results for our clients today. This works in two main ways:

To give one example (of many) of how we’re benefiting from this aspect of our repository:

For us, database research has become one of our core research methods as an agency. In essence, at the start of most programs, we will filter our database along certain dimensions to uncover insights and inspiration from past programs that we can use to inform present roadmaps and strategy.

Of course, we don’t blindly follow what the database tells us. What worked on one website won’t necessarily work on another – which is why experimentation is so important. But this kind of database research gives us additional insight and data that we can use to gain a headstart at the beginning of a new program – and that we can use to enrich our experiment data further down the line.

Ultimately, this means we can take more informed risks and drive more client value than would be possible without the repository and the insights it holds.

4. Enhanced Prioritization – ConfidenceAI

Based on thousands of a/b tests, Harvard Business School found that about 1 in 10 business experiments has a positive impact on its primary metric, i.e. 1 in 10 were winners.

Assuming that more or less every team behind each of these experiments expected their experiment to win – surely a fairly safe assumption – then what this number tells us is that human beings are pretty poor at predicting experiment results…

…but AI isn’t.

Over the last twelve months, we’ve been training a machine learning model to take the data in our experiment repository – made up of more than 25,000 data points – and use it to predict the results of future a/b tests.

Confidence AI

Confidence AI

Though this is only an early iteration, so far this model – dubbed Confidence AI – has been able to predict the results of winning a/b tests with 63% accuracy.*

This is obviously miles above the 10% figure cited above – and it’s causing quite a stir within Conversion.

Though there are myriad different possible uses for this kind of technology, the main value we’re seeing from it right now relates to prioritization:

Once our consultants have done their research and come up with different test concepts for a program, they then need to decide which concepts to prioritize. Confidence AI analyzes each of these concepts based on a range of factors and then computes a confidence score that incorporates all of this information.

By pairing this confidence score with information about the expected build size and possible dependencies associated with each experiment, we can then prioritize our backlog as effectively as possible, based on mountains of data and a cutting edge machine learning model.

This allows us to zero in on winners much more reliably – and to also deprioritize tests that are, based on the data, much less likely to win.

Before you can even think about running this kind of machine learning assisted prioritization, you need to have a robust experiment database – with powerful taxonomies – in place.

*When confidence AI computes a confidence score of greater than 66/100, we call this as a prediction of a winner.

5. Sharpen executions

Once we’ve come up with our hypotheses, we next need to decide on our execution, i.e. the specific experiment that will allow us to test this hypothesis.

Our experiment repository is proving to be extremely useful when it comes to fine-tuning our executions. For example, take this experiment:

One of our clients had a single-page free-trial funnel.

For a range of data-backed reasons, we hypothesized that we could increase their free trial sign-up rate by splitting this journey out into multiple steps.

As part of this test, we knew we were going to need to design a new progress bar.

By filtering our repository by industry, website area, and component, we were able to find past experiments on similar websites that had involved a progress bar redesign.

In this instance, we found that more-detailed progress bars tended to be less effective than less-detailed progress bars.

We therefore chose to design a less-detailed progress bar, which ultimately contributed to a 7.5% uplift in free trials in this experiment.

Progress bar experiment

Simplified progress bar design

 

How to build a high-impact experiment repository of your own: 4 steps

Now that we’ve (hopefully!) convinced you that you need to start building an experiment repository of your own, we’re going to share the main steps we’ve gone through to get our repository to where it is today.

Where possible, we’ll share the resources and specific solutions that we ourselves have used to overcome the challenges associated with building an effective experiment repository.

To start, then, an effective repository needs to meet three criteria:

Each of the steps described below are geared towards meeting these three criteria.

1. Choose a tool

Really, the first thing you need to decide on when it comes to building an experiment repository is where you want to house it.

There are a number of tools specifically built for this purpose, e.g. our very own Liftmap tool.

Screenshot of liftmap

Screenshot taken from Liftmap

Other tools, such as Airtable or Notion, weren’t built specifically with meta-analysis in mind, but their power and customizability mean they offer a good option to anyone with enough time and skill to use them effectively.

Some things to consider when selecting a tool:

When it comes to tooling for meta-analysis, each experimentation team is likely to have different requirements, so a one size fits all approach probably won’t work.

2. Settle on your taxonomies

Once you’ve got all of your experiments in one place, you need a systematic way of categorizing them along certain dimensions.

That’s where taxonomies come in.

Taxonomies are systems of categorization. To a large extent, the effectiveness of a repository is dependent upon the effectiveness of the taxonomies it uses.

Without good, shared, systematic taxonomies you can’t:

  1. Categorize experiments consistently – instead, everyone will use their own intuitions about how best to categorize their experiments, resulting in all sorts of bias and imprecision being baked into your data.
  2. Filter experiments – if everyone is categorizing experiments by their own rules, how can you filter your database to find the experiments you’re looking for?
  3. Extract macro insights – taxonomies are the lens through which macro-level patterns in your data begin to emerge. Without powerful taxonomies, it’s all just a morass of undifferentiated data points.
  4. Run machine learning – taxonomies give your machine learning model salient dimensions that it can use to observe and unearth patterns.

Really, it wasn’t until we developed watertight taxonomies – with the Levers Framework at the forefront – that we were able to extract the full value from our experiment repository.

Levers Framework overview

High-level view of our Levers Framework

Here are some of the most important taxonomies we use:

3. Tagging

Once you’ve decided on your taxonomies, you then need to tag up all of the past experiments that you’re planning to add to your database.

We spent months working back through our experiment database tagging more or less every experiment we’ve ever run.

This was quite labor intensive, but it’s a one time job – and once it’s done, it brings all of your past experiment results that have been collecting dust back into play.

With this complete, you can now filter your database in all kinds of ways, and begin drawing upon past experiments to inform future directions.

4. Buy-in and workflows

At this point, your repository should be more or less good-to-go – but there’s still one more thing you’ll need to get set up before you can really get the ball rolling: workflows.

Your repository shouldn’t be a fixed, static thing; it should be a living, growing database that is continually updated as new experiment data comes in.

The more data you add to the repository, the more valuable it becomes.

Unfortunately, data capture is one of the biggest challenges to overcome when building your database – especially if you’re democratizing experimentation across your entire company.

The first step here is about gaining buy-in. Ultimately, people on the ground will be the ones capturing the data, so they need to know why they’re being asked to do this extra work. Specifically, they need to know:

Making sure that the relevant teams have access to this information will make everything run much more smoothly.

Once you’ve got buy-in and everyone is on the same page, the next step is about building workflows that support your meta-analysis.

This will vary a lot from company to company, but to give you a bit of inspiration, here are a few things we’ve done to encourage our team to capture their data:

As you can imagine, these workflows have taken time and effort to build, but we’re now at a point where our experiment database is growing more or less organically, without the need for constant intervention or micro-managing.

This is the final goal – the point at which your repository has become a true second brain that effortlessly records data and makes it accessible to your entire organization.

Final thoughts: is it worth it?

Some people in the experimentation space have been known to question the value of meta-analysis.

Can you really take insights from past programs and apply them to current problems? Can insights from one business, or team, be applied to another? Does it actually work?

These concerns are completely legitimate – but here’s the thing:

This isn’t a subject we need to sit around debating.

It’s an empirical question, and like any good empirical question, it can be answered by an experiment.

We’ve run the experiment.

The results are in.

By incorporating meta-analytic techniques into our approach to experimentation, we’ve massively increased our primary metric – win rate – while holding our guardrail metrics – velocity, volume, and client satisfaction – steady.

The test is a winner.

Of course, this isn’t a controlled experiment; there are all kinds of confounding variables involved.

(Heck, we might even need to run a meta-analysis to see whether meta-analysis is effective!)

But so far, at least, the data’s all pointing in the right direction.

Join 5,000 other people who get our newsletter updates