With experimentation and conversion optimisation, there is never a shortage of ideas to test.
In other industries, specialist knowledge is often a prerequisite. It’s hard to have an opinion on electrical engineering or pharmaceutical research without prior knowledge.
But with experimentation everyone can have an opinion: marketing, product, engineering, customer service – even our customers themselves. They can all suggest ideas to improve the website’s performance.
The challenge is how you prioritise the right experiments.
There’s a finite number of experiments that we can run – we’re limited both by the resource to create and analyse experiments, and also the traffic to run experiments on.
Prioritisation is the method to maximise impact with an efficient use of resources.
Prioritisation is the method to maximise impact with an efficient use of resources.
Where most prioritisation frameworks fall down
There are multiple prioritisation frameworks – PIE (from WiderFunnel), PXL (from ConversionXL), and more recently the native functionality within Optimizely’s Program Management.
Each framework has a broadly consistent approach: prioritisation is based on a combination of (a) the value of the experiment, and (b) the ease of execution.
WiderFunnel’s PIE framework uses three factors, scored out of 10:
- potential (how much improvement can be made on the pages?)
- importance (how valuable is the traffic to the page?) and
- ease (how complicated will the test be to implement?)
This is effective: it ensures that you consider both the potential uplift from the experiment alongside the importance of the page. (A high impact experiment on a low value page should rightfully be deprioritised.)
But it can be challenging to score these factors objectively – especially when considering an experiment’s potential.
Conversion XL’s PXL framework looks to address this. Rather than asking you to rate an experiment out of 10, it asks a series of yes/no questions to objectively assess its value and ease.
Experiments that are above the fold and based on quantitative and qualitative research will rightly score higher than a subtle experiment based on gut instinct alone.
This approach works well: it rewards the right behaviour (and can even help drive the right behaviour in the future, as users submit concepts that are more likely to score well).
But while it improves the objectivity in scoring, it lacks two fundamental elements:
- It accounts for page traffic, but not page value. So an above-the-fold research-backed experiment on a zero-value page could be prioritised above experiments that could have a much higher impact. (We used to work with a university in the US whose highest-traffic page was a blog post on ramen noodle recipes. It generated zero leads – but the PXL framework wouldn’t account for that automatically.)
- While it values qualitative and quantitative research, it doesn’t appear to include data from the previous experiments in its prioritisation. We know that qualitative research can sometimes be misleading (customers may say one thing and do something completely different). That’s why we validate our research with experimentation. But in this model, its focus is purely on research – whereas a conclusive experiment is the best indicator of a future iteration’s success.
Moreover, most frameworks struggle to adapt as an experimentation programme develops. They tend to work in isolation at the start – prioritising a long backlog of concepts – but over time, real life gets in the way.
Competing business goals, fire-fighting and resource challenges mean that the prioritisation becomes out-of-date – and you’re left with a backlog of experiments that is more static than a dynamic experimentation programme demands.
Introducing SCORE – Conversion.com’s prioritisation process
Our approach to prioritisation is based on more than 10 years’ experience running experimentation programmes for clients big and small.
We wanted to create an approach that:
- Prioritises the right experiments: So you can deliver impact (and insight) rapidly.
- Adapts based on insight + results: The more experiments you run, the stronger your prioritisation becomes.
- Removes subjectivity: As far as possible, data should be driving prioritisation – not opinion.
- Allows for the practicalities of running an experimentation programme: It adapts to the reality of working in a business where the wider priorities, goals and resources change.
But the downside is that it’s not a simple checklist model. In our experience, there’s no easy answer to prioritisation – it takes work. But it’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.
It’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.
With that in mind, we’re presenting SCORE – Conversion.com’s prioritisation process:
As you’ll see, the prioritisation of one concept against each other happens in the middle of the process (“Order”) and is contingent on the programme’s strategy.
Strategy: Prioritising your experimentation framework
At Conversion.com, our experimentation framework is fundamental to our approach. Before we start on concepts, we first define the goal, KPIs, audiences, areas and levers (the factors that we believe affect user behaviour).
When your framework is complete (or, at least, started – it’s never really complete), we can prioritise at the macro level – before we even think about experiments.
Assuming we’ve defined and narrowed down the goal and KPIs, we then need to prioritise the audiences, areas and levers:
Prioritise your audiences on volume, value and potential:
- Volume – the monthly unique visitors of this audience. (That’s why it’s helpful to define identifiable audiences like “prospects”, “users on a free trial”, “new customers”, and so on.)
- Value – the revenue or profit per user. (Continuing the above example, new customers are of course worth more than prospects – but at a far lower volume.)
- Potential – the likelihood that you’ll be able to modify their behaviour. On a retail website, for example, there may be less potential to impact returning customers than potential customers – it may be harder to increase their motivation and ability to convert relative to a user who is new to the website.
You can, of course, change the criteria here to adapt the framework to better suit your requirements. But as a starting point, we suggest combining the profit per user and the potential improvement.
Don’t forget, we want to prioritise the biggest value audiences first – so that typically means targeting as many users as possible, rather than segmenting or personalising too soon.
In much the same way as audiences, we can prioritise the areas – the key content that the user interacts with.
For example, identify the key pages on the website (homepage, listings page, product page, etc) and score them on:
- Volume – the monthly unique visitors for the area.
- Value – the revenue or profit from the area.
- Potential – the likelihood that you’ll be able to improve the area’s performance. (Now’s a good time to use your quantitative and qualitative research to inform this scoring.)
(It might sound like we’re falling into the trap of other prioritisation models: asking you to estimate potential, which can be subjective. But, in our experience, people are more likely to score an area objectively, rather than an experiment that they created and are passionate about.)
Also, this approach doesn’t need to be limited to your website. You can apply it to any other touchpoint in the user journey too – including offline. Your cart abandonment email, customer calls and Facebook ads can (and should) be used in this framework.
As above, levers are defined as the key factors or themes that you think affect an audience’s motivation or ability to convert on a specific area.
These might be themes like pricing, trust, delivery, returns, form usability, and so on. (Take another look at the experimentation framework to see why it’s important to separate the lever from the execution.)
When you’re starting to experiment, it’s hard to prioritise your levers – you won’t know what will work and what won’t.
That’s why you can prioritise them on either:
- Confidence – a simple score to reflect the quantitative and qualitative research that supports the lever. If every research method shows trust as a major concern for your users, it should score higher than another lever that only appears occasionally.
- Win rate – If you have run experiments on this lever in the past, what was their win rate? It’s normally a good indicator of future success.
Of course, if you’re starting experimentation, you won’t have a win rate to rely on (so estimating the confidence is a fantastic start).
But if you’ve got a good history of experimentation – and you’ve run the experiments correctly, and focused them on a single lever – then you should use this data to inform your prioritisation here.
Again, the more we experiment, the more accurate this gets – so don’t obsess over every detail. (After all, it’s possible that a valid lever may have a low win rate simply because of a couple of experiments with poor creative.)
Putting this all together, you can now start to prioritise the audiences, areas and levers that should be focused on:
As you can see, we haven’t even started to think about concepts and execution – but we have a strong foundation for our prioritisation.
Concepts: Getting the right ideas
After defining the strategy, you can now run structured ideation around the KPIs, audiences, areas and levers that you’ve defined.
This creates the ideal structure for ideation.
Rather than starting with, “What do we want to test?” or “How can we improve product pages?”, we’re instead focusing on the core hypotheses that we want to validate:
- How can we improve the perception of pricing on product pages for new customers?
- How can we overcome concerns around delivery in the basket for all users?
- And so on.
This structured ideation around a single hypothesis generates far better ideas – and means you’re less susceptible to the tendency to throw everything into a single experiment (and not knowing which part caused the positive/negative result afterwards).
Order: Prioritising the concepts
When prioritising the concepts – especially when a lever hasn’t been validated by prior experiments – you should look to start with the minimum viable experiment (MVE).
Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis. (Can we test a hypothesis with 5 hours of development time rather than 50?)
Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis.
This is a hugely important concept – and one that’s easily overlooked. It’s natural that we want to create the “best” iteration for the content we’re working on – but that can limit the success of our experimentation programme. It’s far better to run ten MVEs across multiple levers that take 5 hours each to build, rather than one monster experiment that takes 50 hours to build. We’ll learn 10x as much, and drive significantly higher value.
So at the end of this phase, we should have defined the MVE for each of the high priority levers that we’re going to start with.
Roadmap: Creating an effective roadmap
There are many factors that can affect your experimentation roadmap – factors that stop you from starting at the top of your prioritised list and working your way down:
- You may have limited resource, meaning that the bigger experiments have to wait till later.
- There may be upcoming page changes or product promotions that will affect the experiment.
- Other teams may be running experiments too, which you’ll need to plan around.
And there are dozens more: resource, product changes, marketing, seasonality can all block experiments – but shouldn’t block experimentation altogether.
That’s why planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of external factors.
Planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of internal factors.
To plan effectively:
- Identify your swimlanes: These are the audiences and areas from your framework that you’ll be experimenting on. (Again, make sure you focus on the high priority audiences and areas – don’t be tempted to segment or personalise too early.)
- Estimate experiment duration: Use an appropriate minimum detectable effect for the audience and area to calculate the duration, then block out this time in the roadmap.
- Experiment across multiple levers: Gather more insight (and spread your risk) by experimenting across multiple levers. If you focus heavily on a lever like “trust” with your first six experiments, you might have to start again if the first two or three experiments aren’t successful.
Experimentation: Running and analysing the experiments
With each experiment, you’ll learn more about your users: what changes their behaviour and what doesn’t.
You can scale successful concepts and challenge unsuccessful concepts.
For successful experiments, you can iterate by:
- Moving incrementally from minimum viable experiments to more impactful creative. (With one Conversion.com client, we started with a simple experiment that promoted the speed of delivery. After multiple successful experiments around delivery, we eventually worked with the client to test the commercial viability of same-day delivery.)
- Applying the same lever to other areas and potentially audiences. If amplifying trust messaging on the basket page works well, it’ll probably work well on listing and product pages too.
Meanwhile, an experiment may be unsuccessful because:
- The lever was invalidated – Qualitative research may have said customers care about the lever, but in practice makes no difference.
- The execution was poor – It happens sometimes. Every audience/area/lever combination can have thousands of possible executions – you won’t get it right first time, every time, and you risk rejecting a valid lever because of a lousy experiment.
- There an external factor – It’s also possible that other factors affected the test: there was a bug, the underlying page code changed, a promotion or stock availability affected performance. It doesn’t happen often, but it needs to be checked.
In experiment post-mortems, it’s crucial to investigate which of these is most likely, so we don’t reject a lever because of poor execution or external factors.
Conduct experiment post-mortems so you don’t reject a lever because of poor execution or external factors.
What’s good (and bad) about this approach
This approach works for Conversion.com – we’ve validated it on clients big and small for more than ten years, and have improved it significantly along the way.
It’s good because:
- It’s a structured and effective prioritisation strategy.
- It doesn’t just reward data and insight – it actively adapts and improves over time.
- It works in the real-world, allowing for the practicalities of running an experimentation programme.
On the flip side, its weaknesses are that:
- It takes time to do properly. (You should create and prioritise your framework first.)
- You can’t feed in 100 concepts and expect it to spit out a nicely ordered list. (But in our experience, you probably don’t want to.)