SCORE: A dynamic prioritization framework for AB tests from Conversion.com

Stephen Pavlovich

Why prioritize?

With experimentation and conversion optimization, there is never a shortage of ideas to test.

In other industries, specialist knowledge is often a prerequisite. It’s hard to have an opinion on electrical engineering or pharmaceutical research without prior knowledge.

But with experimentation everyone can have an opinion: marketing, product, engineering, customer service – even our customers themselves. They can all suggest ideas to improve the website’s performance.

The challenge is how you prioritize the right experiments.

There’s a finite number of experiments that we can run – we’re limited both by the resource to create and analyze experiments, and also the traffic to run experiments on.

Prioritisation is the method to maximise impact with an efficient use of resources.

Prioritisation is the method to maximise impact with an efficient use of resources.

Where most prioritization frameworks fall down

There are multiple prioritization frameworks – PIE (from WiderFunnel), PXL (from ConversionXL), and more recently the native functionality within Optimizely’s Program Management.

Each framework has a broadly consistent approach: prioritization is based on a combination of (a) the value of the experiment, and (b) the ease of execution.

For example, ConversionXL’s PXL framework asks a series of yes/no questions to objectively assess each experiment’s value and ease.

Experiments that are above the fold and based on quantitative and qualitative research will rightly score higher than a subtle experiment based on gut instinct alone.

This approach works well: it rewards the right behavior (and can even help drive the right behavior in the future, as users submit concepts that are more likely to score well).

But while it improves the objectivity in scoring, it lacks two fundamental elements:

  1. It accounts for page traffic, but not page value. So an above-the-fold research-backed experiment on a zero-value page could be prioritized above experiments that could have a much higher impact. (We used to work with a university in the US whose highest-traffic page was a blog post on ramen noodle recipes. It generated zero leads – but the PXL framework wouldn’t account for that automatically.)
  2. While it values qualitative and quantitative research, it doesn’t appear to include data from the previous experiments in its prioritization. We know that qualitative research can sometimes be misleading (customers may say one thing and do something completely different). That’s why we validate our research with experimentation. But in this model, its focus is purely on research – whereas a conclusive experiment is the best indicator of a future iteration’s success.

Moreover, most frameworks struggle to adapt as an experimentation program develops. They tend to work in isolation at the start – prioritizing a long backlog of concepts – but over time, real life gets in the way.

Competing business goals, fire-fighting and resource challenges mean that the prioritization becomes out-of-date – and you’re left with a backlog of experiments that is more static than a dynamic experimentation program demands.

Introducing SCORE – Conversion.com’s prioritization process

Our approach to prioritization is based on more than 10 years’ experience running experimentation programs for clients big and small.

We wanted to create an approach that:

But the downside is that it’s not a simple checklist model. In our experience, there’s no easy answer to prioritization – it takes work. But it’s better to spend a little more time on prioritization than waste a lot more effort building the wrong experiments.

It’s better to spend a little more time on prioritization than waste a lot more effort building the wrong experiments.

With that in mind, we’re presenting SCORE – Conversion.com’s prioritization process:

As you’ll see, the prioritization of one concept against each other happens in the middle of the process (“Order”) and is contingent on the program’s strategy.

Strategy: Prioritising your experimentation framework

At Conversion.com, our experimentation framework is fundamental to our approach. Before we start on concepts, we first define the goal, KPIs, audiences, areas and levers (the factors that we believe affect user behavior).

You can read more about our framework here and you can create your own with the templates here.

When your framework is complete (or, at least, started – it’s never really complete), we can prioritize at the macro level – before we even think about experiments.

Assuming we’ve defined and narrowed down the goal and KPIs, we then need to prioritize the audiences, areas and levers:

Audiences

Prioritise your audiences on volume, value and potential:

You can, of course, change the criteria here to adapt the framework to better suit your requirements. But as a starting point, we suggest combining the profit per user and the potential improvement.

Don’t forget, we want to prioritize the biggest value audiences first – so that typically means targeting as many users as possible, rather than segmenting or personalising too soon.

Areas

In much the same way as audiences, we can prioritize the areas – the key content that the user interacts with.

For example, identify the key pages on the website (homepage, listings page, product page, etc) and score them on:

(It might sound like we’re falling into the trap of other prioritization models: asking you to estimate potential, which can be subjective. But, in our experience, people are more likely to score an area objectively, rather than an experiment that they created and are passionate about.)

Also, this approach doesn’t need to be limited to your website. You can apply it to any other touchpoint in the user journey too – including offline. Your cart abandonment email, customer calls and Facebook ads can (and should) be used in this framework.

If your KPI is profit, you may want to include offline content like returns labels in prioritization model.

If your KPI is profit, you may want to include offline content like returns labels in prioritization model.

Levers

As above, levers are defined as the key factors or themes that you think affect an audience’s motivation or ability to convert on a specific area.

These might be themes like pricing, trust, delivery, returns, form usability, and so on. (Take another look at the experimentation framework to see why it’s important to separate the lever from the execution.)

When you’re starting to experiment, it’s hard to prioritize your levers – you won’t know what will work and what won’t.

That’s why you can prioritize them on either:

Of course, if you’re starting experimentation, you won’t have a win rate to rely on (so estimating the confidence is a fantastic start).

But if you’ve got a good history of experimentation – and you’ve run the experiments correctly, and focused them on a single lever – then you should use this data to inform your prioritization here.

Again, the more we experiment, the more accurate this gets – so don’t obsess over every detail. (After all, it’s possible that a valid lever may have a low win rate simply because of a couple of experiments with poor creative.)  

Putting this all together, you can now start to prioritize the audiences, areas and levers that should be focused on:

As you can see, we haven’t even started to think about concepts and execution – but we have a strong foundation for our prioritization.

Concepts: Getting the right ideas

After defining the strategy, you can now run structured ideation around the KPIs, audiences, areas and levers that you’ve defined.

This creates the ideal structure for ideation.

Rather than starting with, “What do we want to test?” or “How can we improve product pages?”, we’re instead focusing on the core hypotheses that we want to validate:

This structured ideation around a single hypothesis generates far better ideas – and means you’re less susceptible to the tendency to throw everything into a single experiment (and not knowing which part caused the positive/negative result afterwards).

Order: Prioritising the concepts

When prioritizing the concepts – especially when a lever hasn’t been validated by prior experiments – you should look to start with the minimum viable experiment (MVE).

Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis. (Can we test a hypothesis with 5 hours of development time rather than 50?)

Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis.

This is a hugely important concept – and one that’s easily overlooked. It’s natural that we want to create the “best” iteration for the content we’re working on – but that can limit the success of our experimentation program. It’s far better to run ten MVEs across multiple levers that take 5 hours each to build, rather than one monster experiment that takes 50 hours to build. We’ll learn 10x as much, and drive significantly higher value.

In one AB test for a real estate client, we created a fully functional “map view”. It was based on a significant volume of user research – but the minimum viable experiment would have been simply to test adding a “Map view” button without the underlying functionality.

In one AB test for a real estate client, we created a fully functional “map view”. It was based on a significant volume of user research – but the minimum viable experiment would have been simply to test adding a “Map view” button without the underlying functionality.

So at the end of this phase, we should have defined the MVE for each of the high priority levers that we’re going to start with.

Roadmap: Creating an effective roadmap

There are many factors that can affect your experimentation roadmap – factors that stop you from starting at the top of your prioritized list and working your way down:

And there are dozens more: resource, product changes, marketing, seasonality can all block experiments – but shouldn’t block experimentation altogether.

That’s why planning your roadmap is as important as prioritizing the experiments. Planning delivers the largest impact (and insight) in spite of external factors.

Planning your roadmap is as important as prioritizing the experiments. Planning delivers the largest impact (and insight) in spite of internal factors.

To plan effectively:

Experimentation: Running and analyzing the experiments

With each experiment, you’ll learn more about your users: what changes their behavior and what doesn’t.

You can scale successful concepts and challenge unsuccessful concepts.

For successful experiments, you can iterate by:

Meanwhile, an experiment may be unsuccessful because:

In experiment post-mortems, it’s crucial to investigate which of these is most likely, so we don’t reject a lever because of poor execution or external factors.

Conduct experiment post-mortems so you don’t reject a lever because of poor execution or external factors.

What’s good (and bad) about this approach

This approach works for Conversion.com – we’ve validated it on clients big and small for more than ten years, and have improved it significantly along the way.

It’s good because:

On the flip side, its weaknesses are that:

So, what now?

  1. If you haven’t already, print out or copy this Google slide for Conversion.com’s experimentation framework.
  2. Email marketing@conversion.com to join our mailing list. We like sharing how we approach experimentation.
  3. Share your feedback below. What do like? What do you do differently?

Join 5,000 other people who get our newsletter updates