Everything we produce is the result of our choices. Which products and features do we roll out? Which do we roll back? And which ideas never even make it on the backlog?
The problem is – most of us suck at making choices.
Decisions are made by consensus, based on opinion not evidence. We’re riddled with subjectivity and bias, often masquerading as “experience”, “best practice” or “gut instinct”.
But there’s a better way – using experimentation as a way to define your product roadmap.
Experimentation as a product development framework
For many product organisations, experimentation serves two functions:
1.Safety check: Product and engineering run A/B tests (or feature flags) to measure the impact of new features.
2.Conversion optimisation: Marketing and growth run A/B tests often, for example, on the sign-up flow to optimise acquisition.
But this neglects experimentation’s most important function:
3.Product strategy: Product teams use experimentation to find out which features and ideas their customers will actually use and enjoy.
In doing so, you can use experimentation to inform product – not just validate it. You can test bolder ideas safely, creating better products for your customers. By putting experimentation at the heart of their business, organisations like Facebook, Amazon, Uber and Spotify have created and developed products used by billions worldwide.
But they’re in the minority. They represent the 1% of brands that have adopted experimentation as not just a safety check, but as a driving force for their product.
So how do the 99% of us better adopt experimentation?
Five principles of product experimentation
#1 Experiment to solve your biggest problems.
First, and most importantly, you should experiment on your biggest problems – not your smallest.
If experimentation is only used to “finesse the detail” by A/B testing minor changes, you’re wasting the opportunity.
To start, map out the products or features you’re planning. What are the assumptions you’re making, and what are the risks you’re taking? How can you validate these assumptions with experimentation?
Also, what are the risks you’re not taking – but would love to at least try with an A/B test?
#2 Be bold.
Experimentation lets you experiment with the confidence of a safety net.
Because experiments are – by their nature – measurable and reversible, it gives us a huge opportunity to test ideas that are bolder than we’d ever dare.
Type 1 decisions are irreversible – “one-way doors”:
“These decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before.”
Type 2 decisions are reversible – “two-way doors”:
“But most decisions aren’t like [Type 1 decisions] – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through.”
As a company grows, everything needs to scale, including the size of your failed experiments. If the size of your failures isn’t growing, you’re not going to be inventing at a size that can actually move the needle. Amazon will be experimenting at the right scale for a company of our size if we occasionally have multibillion-dollar failures.
If we aren’t prepared to risk failure, then we don’t innovate. Instead, we stagnate and become Blockbuster in the Netflix era.
Instead, experimentation gives us a safety net to take risks. We can test our boldest concepts and ideas, which would otherwise be blocked or watered down by committee. After all, it’s only a test…
#3 Test early / test often.
Experimentation works best when you test early and often.
But for most product teams, they’re testing once, at the end. They do this to measure the impact of a new feature before or just after it launches. (This is the “safety check” concept, mentioned above.)
Their process normally looks like this:
Whether the experiment wins or loses – whether the impact is positive or negative – the feature is typically rolled out anyway.
Why? Because of the emotional and financial investment in it. If you’ve spent 6 or 12 months building something and then find out it doesn’t work, what do you do?
You could revert back and write off the last 6 months’ investment. Or you could persevere and try to fix it as you go.
Most companies choose the second option – they invest time and money in making their product worse.
As Carson Forter, ex-Twitch now Future Research, says of bigger feature releases:
“By the time something this big has been built, the launch is very, very unlikely to be permanently rolled back no matter what the metrics say.”
That’s why we should validate early concepts as well as ready-to-launch products. We start testing as early as possible – before we commit to the full investment – to get data on what works and what doesn’t.
After all, it’s easier to turn off a failed experiment than it is to write off a failed product launch. What’s more, gathering data from experiments will help us guide the direction of the product.
#4 Start small and scale.
To do that – to test early and often – it means you’ll frequently have to start with the “minimum viable experiment” (MVE).
Just like a minimum viable product, we’re looking to test a concept that as simple and as impactful as possible.
So what does this look like in practice? Often “painted door tests” work well here. You don’t build the full product or feature and test that. After all, by that point, you’ve already committed to the majority of the investment. Instead, you create the illusion of the product or feature.
Suppose a retailer wanted to test a subscription product. They could build the full functionality and promotional material and then find out if it works. Or they could add a subscription option to their product details pages, and see if people select it.
Ideally before they run the experiment, they’d plan what they’d do next based on the uptake. So if fewer than 5% of customers click that option, they may deprioritise it. If 10% choose it, they might add it to the backlog. And if 20% or more go for it, then it may become their #1 priority till it was shipped.
We’ve helped our clients apply this to every aspect of their business. Should a food delivery company have Uber-style surge pricing? Should they allow tipping? What product should they launch next?
#5 Measure what matters.
The measurement of the experiment is obviously crucial. If you can’t measure the behaviour that you’re looking to drive, there’s probably little point in running the experiment.
So it’s essential to define both:
the primary metric or “overall evaluation criterion” – essentially, the metric that shows whether the experiment wins or loses, and
any second or “guardrail metrics” – metrics you’re not necessarily trying to affect, but don’t want to perform any worse.
You’d set these with any experiment – whether you’re optimising a user journey or creating a new product.
As far as possible – and as far as sample size/statistical significance allows – focus these metrics on commercial measures that affect business performance. So “engagement” may be acceptable when testing a MVE (like the fake subscription radio button above), but in future iterations you should build out the next step in the flow to ensure that the positive response is maintained throughout the funnel.
Why is this approach better?
1.You build products with the strongest form of evidence – not opinion. Casey Winters talks about the dichotomy between product visionaries and product leaders. A visionary relies more on opinion and self-belief, while a leader helps everyone to understand the vision, then builds the process and uses data to validate and iterate.
And the validation we get from experiments is stronger than any other form of evidence. Unlike traditional forms of product research – focus groups, customer interviews, etc – experimentation is both faster and more aligned with future customer behaviour.
The pyramid below shows the “hierarchy of evidence” – with the strongest forms of evidence at the top, and the weakest at the bottom.
You can see that randomised controlled trials (experiments or A/B tests) are second only to meta analyses of multiple experiments in terms of quality of evidence and minimal risk of bias:
2.Low investment – financially and emotionally. When we constantly test and iterate, we limit the financial and emotional fallout. Because we test early, we’ll quickly see if our product or feature resonates with users. If it does, we iterate and expand. If it doesn’t, we can modify the experiment or change direction. Either way, we’re limiting our exposure.
This applies emotionally as well as financially. There’s less attachment to a minimum viable experiment than there is a fully-built product. It’s easier to kill it and move on.
And because we’re reducing the financial investment, it means that…
3.You can test more ideas. In a standard product development process, you have to choose the products or features to launch, without strong data to rely on. (Instead, you may have market research and focus groups, which are beneficial but don’t always translate to sales).
In doing so, you narrow down your product roadmap unnecessarily – and you gamble everything on the product you launch.
But with experimentation, you can test all those initial ideas (and others that were maybe too risky to be included). Then you can iterate and develop the concept to a point where you’re launching with confidence.
It’s like cheating at product development – we can see what happens before we have to make our choice.
4.Test high risk ideas in a low risk way. Because of the safety net that experimentation gives us (we can just turn off the test), it means we can make our concepts 10x bolder.
We don’t have to water down our products to reach a consensus with every stakeholder. Instead, we can test radical ideas – and just see what happens.
Like Bill Murray in Groundhog Day, we get to try again and again to see what works and what doesn’t. So we don’t have to play it safe with our ideas – we can test whatever we want.
Don’t forget, if we challenge the status quo – if we test the concepts that others won’t – then we get a competitive advantage. Not by copying our competitors, but by innovating with our products.
And this approach is, of course, hugely empowering for teams…
5. Experiment with autonomy. Once you’ve set the KPIs for experimentation – ideally the North Star Metric that directs the product – then your team can experiment with autonomy.
There’s less need for continual approval, because the opinion you need is not from your colleagues and seniors within the business, but from your customers.
And this is a hugely liberating concept. Teams are free to experiment to create the best experience for their customers, rather than approval from their line manager.
6. Faster. Experimentation doesn’t just give you data you can’t get anywhere else, it’s almost always faster too.
Suppose Domino’s Pizza want to launch a new pizza. A typical approach to R&D might mean they commission a study in consumer trends and behaviour, then use this to shortlist potential products, then run focus groups and taste tests, then build the supply chain and roll out the new product to their franchisees, and then…
Well, then – 12+ months after starting this process – they see whether customers choose to buy the new pizza. And if they don’t…
But with experimentation, that can all change. Instead of the 12+ month process above, Domino’s can run a “painted door” experiment on the menu. Instead of completing the full product development, then can add potential pizzas to the menu that look just like any other product on the menu. Then, they measure the add-to-basket rate for each.
This experiment-led approach might take just a couple of weeks (and a fraction of the cost) of traditional product development. What’s more, the data gathered is, as above, likely to correlate more closely to future sales.
7. Better for customers. When people first hear about the painted door testing like this example with Domino’s, they worry about the impact on the customer.
“Isn’t that a bad customer experience – showing them a product they can’t order?”
And that’s fair – it’s obviously not a good experience for the customer. But the potential alternative is that you invest 12 months’ work in building a product nobody wants.
It’s far better to mildly frustrate a small sample of users in an experiment, than it is to launch products that people don’t love.
To find out more about our approach to product experimentation, please get in touch with Conversion.
Iterating on experiments is often reactive and conducted as an afterthought. A lot of time is spent producing a ‘perfect’ test and if results are unexpected, iterations are run as a last hope to gain value from the time and effort spent on the test. But why subjectively try and execute the perfect experiment in the first instance and postpone the opportunity to uncover learnings along the way by running a minimum viable experiment which is then iterated on?
Experimentation is run at varying levels of maturity (see our Maturity Modelfor more information on this) however we see businesses time and time again getting stuck in the infant stages due to their focus on individual experiments. We see teams wasting time and resource trying to run one ‘perfect’ experiment when the core concept has not been validated.
In order to validate levers quickly without over investing in resource we should ensure hypotheses are executed in their most simple form – the minimum viable experiment (MVE). From here, success of an MVE gives you the green light to test more complex implementations and failure flags problems with the concept/execution early on.
A few years ago, we learnt the importance of this approach the hard way. Based off the back of one hypothesis for an online real estate business, ‘Adding the ability to see properties on a map will help users find the right property and increase enquiries’, we built a complete map view in Optimizely. A heavy amount of resource was used only to find out within the experiment that the map had no impact on user behaviour. What should we have done? Ran an MVE requiring the minimum resource in order to test the concept. What would this have looked like? Perhaps a fake door test in order to test the demand of the map functionality from users.
This blog aims to give:
An understanding of the minimum viable approach to experimentation
A view of potential challenges and tips to overcome them
A clear overview of the benefits of MVEs
The minimum viable approach
A minimum viable experiment looks for the simplest way to run an experiment that validates the concept. This type of testing isn’t about designing ‘small tests’, it is about doing specific, focused experiments that give you the clearest signal of whether or not the hypothesis is valid. Of course, it helps that MVEs are often small so we can test quickly! It is important to challenge yourself by assessing every component of the test and its likelihood of impacting the way the user responds to an experiment. That way, you will be efficient with your resource and yield the same effect on proving the validity of the concept. Running the minimum viable experiment allows you to validate your hypothesis without over investing in levers that turn out to be ineffective.
If the MVE wins, then iterations can be ran to find the optimal execution – gaining learnings along the way. If the test loses, you can look at the execution more thoroughly and determine whether bad execution impacted the test. If so, re-run the MVE. If not, bin the hypothesis to avoid wasting resource on unfruitful concepts.
All hypotheses can be reduced to an MVE, see below a visual example of an MVE testing stream.
Potential challenges to MVEs and tips to overcome them
Although this approach is the most effective, it is not often fully understood, resulting in pushback from stakeholders. Stakeholders are invested in the website and moreover protective of their product. As a result, the expectation from experimentation is that a perfect execution of a problem will be tested which could be implemented immediately should the test win. However, what is not considered is the huge amount of resource this would require without any validity that the hypothesis was correct or that the style of execution was optimal.
In order to overcome this challenge we focus on working with experimentation, marketing and product teams in order to challenge assumptions around MVEs. This education piece is pivotal for stakeholder buy-in. Over the last 9 months, we have been running experimentation workshops with one of the largest online takeaway businesses in Europe and a huge focus of these sessions has been on the minimum viable experiment.
Overview of the benefits of MVEs
Minimum viable experiments have a multitude of benefits. Here, we aim to summarise a few of these:
The minimum viable experiment of a concept allows you to utilise the minimum amount of resource required to see if a concept is worth pursuing further or not.
Validity of the hypothesis is clear
Executing experiments in their most simple form ensures the impact of the changes are evident. As a result, concluding the validity of the experiment is uncomplicated.
Explore bigger solutions to achieve the best possible outcome
Once the MVE has been proven, this justifies investing further resource in exploring bigger solutions. Iterating on experiments allows you to refine solutions to achieve the best possible execution of the hypothesis.
A minimum viable experiment involves testing a hypothesis in its simplest form, allowing you to validate concepts early on and optimise the execution via iterations.
Push back on MVEs are usually due to a lack of awareness of the process and benefits they yield. Educate in order to show teams how effective this type of testing is, not only in gaining the best possible final execution for tests but also in utilising resource with efficiency.
The main benefit of the minimum viable approach is that you spend time and resource on levers that impact your KPIs.
With experimentation and conversion optimisation, there is never a shortage of ideas to test.
In other industries, specialist knowledge is often a prerequisite. It’s hard to have an opinion on electrical engineering or pharmaceutical research without prior knowledge.
But with experimentation everyone can have an opinion: marketing, product, engineering, customer service – even our customers themselves. They can all suggest ideas to improve the website’s performance.
The challenge is how you prioritise the right experiments.
There’s a finite number of experiments that we can run – we’re limited both by the resource to create and analyse experiments, and also the traffic to run experiments on.
Prioritisation is the method to maximise impact with an efficient use of resources.
Prioritisation is the method to maximise impact with an efficient use of resources.
Where most prioritisation frameworks fall down
There are multiple prioritisation frameworks – PIE (from WiderFunnel), PXL (from ConversionXL), and more recently the native functionality within Optimizely’s Program Management.
Each framework has a broadly consistent approach: prioritisation is based on a combination of (a) the value of the experiment, and (b) the ease of execution.
potential (how much improvement can be made on the pages?)
importance (how valuable is the traffic to the page?) and
ease (how complicated will the test be to implement?)
This is effective: it ensures that you consider both the potential uplift from the experiment alongside the importance of the page. (A high impact experiment on a low value page should rightfully be deprioritised.)
But it can be challenging to score these factors objectively – especially when considering an experiment’s potential.
Conversion XL’s PXL framework looks to address this. Rather than asking you to rate an experiment out of 10, it asks a series of yes/no questions to objectively assess its value and ease.
Experiments that are above the fold and based on quantitative and qualitative research will rightly score higher than a subtle experiment based on gut instinct alone.
This approach works well: it rewards the right behaviour (and can even help drive the right behaviour in the future, as users submit concepts that are more likely to score well).
But while it improves the objectivity in scoring, it lacks two fundamental elements:
It accounts for page traffic, but not page value. So an above-the-fold research-backed experiment on a zero-value page could be prioritised above experiments that could have a much higher impact. (We used to work with a university in the US whose highest-traffic page was a blog post on ramen noodle recipes. It generated zero leads – but the PXL framework wouldn’t account for that automatically.)
While it values qualitative and quantitative research, it doesn’t appear to include data from the previous experiments in its prioritisation. We know that qualitative research can sometimes be misleading (customers may say one thing and do something completely different). That’s why we validate our research with experimentation. But in this model, its focus is purely on research – whereas a conclusive experiment is the best indicator of a future iteration’s success.
Moreover, most frameworks struggle to adapt as an experimentation programme develops. They tend to work in isolation at the start – prioritising a long backlog of concepts – but over time, real life gets in the way.
Competing business goals, fire-fighting and resource challenges mean that the prioritisation becomes out-of-date – and you’re left with a backlog of experiments that is more static than a dynamic experimentation programme demands.
Introducing SCORE – Conversion.com’s prioritisation process
Our approach to prioritisation is based on more than 10 years’ experience running experimentation programmes for clients big and small.
We wanted to create an approach that:
Prioritises the right experiments: So you can deliver impact (and insight) rapidly.
Adapts based on insight + results: The more experiments you run, the stronger your prioritisation becomes.
Removes subjectivity: As far as possible, data should be driving prioritisation – not opinion.
Allows for the practicalities of running an experimentation programme: It adapts to the reality of working in a business where the wider priorities, goals and resources change.
But the downside is that it’s not a simple checklist model. In our experience, there’s no easy answer to prioritisation – it takes work. But it’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.
It’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.
With that in mind, we’re presenting SCORE – Conversion.com’s prioritisation process:
As you’ll see, the prioritisation of one concept against each other happens in the middle of the process (“Order”) and is contingent on the programme’s strategy.
Strategy: Prioritising your experimentation framework
At Conversion.com, our experimentation framework is fundamental to our approach. Before we start on concepts, we first define the goal, KPIs, audiences, areas and levers (the factors that we believe affect user behaviour).
When your framework is complete (or, at least, started – it’s never really complete), we can prioritise at the macro level – before we even think about experiments.
Assuming we’ve defined and narrowed down the goal and KPIs, we then need to prioritise the audiences, areas and levers:
Prioritise your audiences on volume, value and potential:
Volume – the monthly unique visitors of this audience. (That’s why it’s helpful to define identifiable audiences like “prospects”, “users on a free trial”, “new customers”, and so on.)
Value – the revenue or profit per user. (Continuing the above example, new customers are of course worth more than prospects – but at a far lower volume.)
Potential – the likelihood that you’ll be able to modify their behaviour. On a retail website, for example, there may be less potential to impact returning customers than potential customers – it may be harder to increase their motivation and ability to convert relative to a user who is new to the website.
You can, of course, change the criteria here to adapt the framework to better suit your requirements. But as a starting point, we suggest combining the profit per user and the potential improvement.
Don’t forget, we want to prioritise the biggest value audiences first – so that typically means targeting as many users as possible, rather than segmenting or personalising too soon.
In much the same way as audiences, we can prioritise the areas – the key content that the user interacts with.
For example, identify the key pages on the website (homepage, listings page, product page, etc) and score them on:
Volume – the monthly unique visitors for the area.
Value – the revenue or profit from the area.
Potential – the likelihood that you’ll be able to improve the area’s performance. (Now’s a good time to use your quantitative and qualitative research to inform this scoring.)
(It might sound like we’re falling into the trap of other prioritisation models: asking you to estimate potential, which can be subjective. But, in our experience, people are more likely to score an area objectively, rather than an experiment that they created and are passionate about.)
Also, this approach doesn’t need to be limited to your website. You can apply it to any other touchpoint in the user journey too – including offline. Your cart abandonment email, customer calls and Facebook ads can (and should) be used in this framework.
As above, levers are defined as the key factors or themes that you think affect an audience’s motivation or ability to convert on a specific area.
These might be themes like pricing, trust, delivery, returns, form usability, and so on. (Take another look at the experimentation framework to see why it’s important to separate the lever from the execution.)
When you’re starting to experiment, it’s hard to prioritise your levers – you won’t know what will work and what won’t.
That’s why you can prioritise them on either:
Confidence – a simple score to reflect the quantitative and qualitative research that supports the lever. If every research method shows trust as a major concern for your users, it should score higher than another lever that only appears occasionally.
Win rate – If you have run experiments on this lever in the past, what was their win rate? It’s normally a good indicator of future success.
Of course, if you’re starting experimentation, you won’t have a win rate to rely on (so estimating the confidence is a fantastic start).
But if you’ve got a good history of experimentation – and you’ve run the experiments correctly, and focused them on a single lever – then you should use this data to inform your prioritisation here.
Again, the more we experiment, the more accurate this gets – so don’t obsess over every detail. (After all, it’s possible that a valid lever may have a low win rate simply because of a couple of experiments with poor creative.)
Putting this all together, you can now start to prioritise the audiences, areas and levers that should be focused on:
As you can see, we haven’t even started to think about concepts and execution – but we have a strong foundation for our prioritisation.
Concepts: Getting the right ideas
After defining the strategy, you can now run structured ideation around the KPIs, audiences, areas and levers that you’ve defined.
This creates the ideal structure for ideation.
Rather than starting with, “What do we want to test?” or “How can we improve product pages?”, we’re instead focusing on the core hypotheses that we want to validate:
How can we improve the perception of pricing on product pages for new customers?
How can we overcome concerns around delivery in the basket for all users?
And so on.
This structured ideation around a single hypothesis generates far better ideas – and means you’re less susceptible to the tendency to throw everything into a single experiment (and not knowing which part caused the positive/negative result afterwards).
Order: Prioritising the concepts
When prioritising the concepts – especially when a lever hasn’t been validated by prior experiments – you should look to start with the minimum viable experiment (MVE).
Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis. (Can we test a hypothesis with 5 hours of development time rather than 50?)
Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis.
This is a hugely important concept – and one that’s easily overlooked. It’s natural that we want to create the “best” iteration for the content we’re working on – but that can limit the success of our experimentation programme. It’s far better to run ten MVEs across multiple levers that take 5 hours each to build, rather than one monster experiment that takes 50 hours to build. We’ll learn 10x as much, and drive significantly higher value.
So at the end of this phase, we should have defined the MVE for each of the high priority levers that we’re going to start with.
Roadmap: Creating an effective roadmap
There are many factors that can affect your experimentation roadmap – factors that stop you from starting at the top of your prioritised list and working your way down:
You may have limited resource, meaning that the bigger experiments have to wait till later.
There may be upcoming page changes or product promotions that will affect the experiment.
Other teams may be running experiments too, which you’ll need to plan around.
And there are dozens more: resource, product changes, marketing, seasonality can all block experiments – but shouldn’t block experimentation altogether.
That’s why planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of external factors.
Planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of internal factors.
To plan effectively:
Identify your swimlanes: These are the audiences and areas from your framework that you’ll be experimenting on. (Again, make sure you focus on the high priority audiences and areas – don’t be tempted to segment or personalise too early.)
Estimate experiment duration:Use an appropriate minimum detectable effect for the audience and area to calculate the duration, then block out this time in the roadmap.
Experiment across multiple levers: Gather more insight (and spread your risk) by experimenting across multiple levers. If you focus heavily on a lever like “trust” with your first six experiments, you might have to start again if the first two or three experiments aren’t successful.
Experimentation: Running and analysing the experiments
With each experiment, you’ll learn more about your users: what changes their behaviour and what doesn’t.
You can scale successful concepts and challenge unsuccessful concepts.
For successful experiments, you can iterate by:
Moving incrementally from minimum viable experiments to more impactful creative. (With one Conversion.com client, we started with a simple experiment that promoted the speed of delivery. After multiple successful experiments around delivery, we eventually worked with the client to test the commercial viability of same-day delivery.)
Applying the same lever to other areas and potentially audiences. If amplifying trust messaging on the basket page works well, it’ll probably work well on listing and product pages too.
Meanwhile, an experiment may be unsuccessful because:
The lever was invalidated – Qualitative research may have said customers care about the lever, but in practice makes no difference.
The execution was poor – It happens sometimes. Every audience/area/lever combination can have thousands of possible executions – you won’t get it right first time, every time, and you risk rejecting a valid lever because of a lousy experiment.
There an external factor – It’s also possible that other factors affected the test: there was a bug, the underlying page code changed, a promotion or stock availability affected performance. It doesn’t happen often, but it needs to be checked.
In experiment post-mortems, it’s crucial to investigate which of these is most likely, so we don’t reject a lever because of poor execution or external factors.
Conduct experiment post-mortems so you don’t reject a lever because of poor execution or external factors.
What’s good (and bad) about this approach
This approach works for Conversion.com – we’ve validated it on clients big and small for more than ten years, and have improved it significantly along the way.
It’s good because:
It’s a structured and effective prioritisation strategy.
It doesn’t just reward data and insight – it actively adapts and improves over time.
It works in the real-world, allowing for the practicalities of running an experimentation programme.
On the flip side, its weaknesses are that:
It takes time to do properly. (You should create and prioritise your framework first.)
You can’t feed in 100 concepts and expect it to spit out a nicely ordered list. (But in our experience, you probably don’t want to.)
Everyone approaches experimentation differently. But there’s one thing companies that are successful at experimentation all have in common: a strategic framework that drives experimentation.
In the last ten years we’ve worked with start-ups through to global brands like Facebook, the Guardian and Domino’s Pizza, and the biggest factor we’ve seen impact success is having this strategic framework to inform every experiment.
In this post, you’ll learn
Why a framework is crucial if you want your experimentation to succeed
How to set a meaningful goal for your experimentation programme
How to build a framework around your goal and create your strategy for achieving it
We’ll be sharing the experimentation framework that we use day in, day out with our clients to deliver successful experimentation projects. We’ll also share some blank templates of the framework at the end, so after reading this you’ll be able to have a go at completing your own straight away.
Why use a framework? Going from tactical to strategic experimentation
Using this framework will help you mature your own approach to experimentation, make a bigger impact, get more insight and have more success.
Having a framework:
Establishes a consistent approach to experimentation across an entire organisation, enabling more people to run more experiments and deliver value
Allows you to spend more time on the strategy behind your experiments and less time on the “housekeeping” of trying to manage your experimentation programme.
Enables you to transition from testing tactically to testing strategically.
Let’s explore that last point in detail.
In tactical experimentation every experiment is an island – separate and unconnected to any others. Ideas generally take the form of solutions – “we should change this to be like that” and come from heuristics (aka guessing), best practice or from copying a competitor. There is very little guiding what experiments run where, when and why.
Strategic experimentation on the other hand is focused on achieving a defined goal and has clear strategy for achieving it. The goal is the starting point – a problem with potential solutions explored through the testing of defined hypotheses. All experiments are connected and experimentation is iterative. Every completed experiment generates more insight that prompts further experiments as you build towards achieving the goal.
If strategic experimentation doesn’t already sound better to you then we should also mention the typical benefits you’ll see as a result of maturing your approach in this way.
You’ll increase your win rate – the % of experiments that are successful
You’ll increase the impact of each successful experiment – on top of any conversion rate uplifts, experiments will generate more actionable insight
You’ll never run out of ideas again – every conclusive experiment will spawn multiple new ideas
Introducing the Conversion.com experimentation framework
As we introduce our framework, you might be surprised by its simplicity. But all good frameworks are simple. There’s no secret sauce here. Just a logical, strategic approach to experimentation.
Just before we get into the detail of our framework a quick note on the role of data. Everything we do should be backed by data. User-research and analytics are crucial sources of insight used to build the layers in our framework. But the experiments we run using the framework are often the best source of data and insight we have. An effective framework should therefore minimise the time it takes to start experimenting. We cannot wait for perfect data to appear before we start, or try and get things right first time. The audiences, areas and levers that we’ll define in our framework come from our best assessment of all the data we have at a given time. They are not static or fixed. Every experiment we run helps us improve and refine them and our framework and strategy is updated continuously as more data becomes available.
Part 1 – Establishing the goal of your experimentation project
The first part of the framework is the most important by far. If you only have time to do one thing after reading this post it should be revisiting the goal of your experimentation.
Most teams don’t set a clear goal for experimentation. It’s a simple as that. Any strategy needs to start with a goal, otherwise how can you differentiate success from wasted effort?
A simple test of whether your experimentation has a clear goal is to ask everyone in your team to explain it. Can they all give exactly the same answer? If not, you probably need to work on this.
Don’t be lazy and choose a goal like “increase sales” or “growth”. We’re all familiar with the importance of goals being “SMART” (specific, measurable, achievable, relevant, time-bound) when setting personal goals. Apply this when setting the goal for experimentation.
Add focus to your goal with targets, measures and deadlines, and wherever possible be specific rather than general. Does “growth” mean “increase profit” or “increase revenue”? By how much? By when? A stronger goal for experimentation would be something like “Add an additional £10m in profit within the next 12 months”. There will be no ambiguity as to whether you have achieved that or not in 12 months’ time.
Some other examples of strong goals for experimentation
“Increase the rate of customers buying add-ons from 10% to 15% in 6 months.”
“Find a plans and pricing model that can deliver 5% more new customer revenue before Q3”
“Determine the best price point for [new product] before it launches in June.”
A clear goal ensures everyone knows what they’re working towards, and what other teams are working towards. This means you can coordinate work across multiple teams and spot any conflicts early on.
Part 2 – Defining the KPIs that you’ll use to measure success
When you’ve defined the goal, the next step is to decide how you’re going to measure it. We like to use a KPI tree here – working backwards from the goal to identify all the metrics that affect it.
For example, if our goal is “Add an additional £10m in profit within the next 12 months” we construct the KPI tree of the metrics that combine to calculate profit. In this simple example let’s say profit is determined by our profit per order times how many orders we get, minus the cost of processing any returns.
These 3 metrics then break down into smaller metrics and so on. You can then decide which of the metrics in the tree you can most influence through experimentation. These then become your KPIs for experimentation. In our example we’ve chosen average order value, order conversion rate and returns rate as these can be directly impacted in experiments. Cost per return on the other hand might be more outside our control.
When you’re choosing KPIs, remember what the K stands for. These are key performance indicators – the ones that matter most. We’d recommend choosing at most 2 or 3. Remember, the more you choose, the more fragmented your experimentation will be. You can track more granular metrics in each experiment, but the overall impact of your experiments will need to be measured in these KPIs.
Putting that all together, you have the first parts of your new framework. This is our starting point – and it is worth the time to get this right as everything else hinges on this.
Part 3 – Understanding how your audience impacts your KPIs and goal
Now we can start to develop our strategy for impacting the KPIs and achieving the goal. The first step is to explore how the make-up of our audience should influence our approach.
In any experiment, we are looking to influence behaviour. This is extremely difficult to do. It’s even more difficult if we don’t know who we’re trying to influence – our audience.
We need to understand the motivations and concerns of our users – and specifically how these impact the goal and KPIs we’re trying to move. If we understand this, then we can then focus our strategy on solving the right problems for the right users.
So how do we go about understanding our audience? For each of our KPIs the first question we should ask is “Which groups of users have the biggest influence on this KPI?” With this question in mind we can start to map out our audience.
Start by defining the most relevant dimensions – the attributes that identify certain groups of users. Device and Location are both dimensions, but these may not be the most insightful ways to split your audience for your specific goal and KPIs. If our goal is to “reduce returns by 10% in 6 months”, we might find that there isn’t much difference in returns rate for desktop users compared to mobile users. Instead we might find returns rate varies most dramatically when we split users by the Product Type that they buy.
For each dimension we can then define the smaller segments – the way users should be grouped under that dimension. For example, Desktop, Mobile and Tablet would be segments within the Device dimension.
You can have a good first attempt at this exercise in 5–10 minutes. At the start, accuracy isn’t your main concern. You want to generate an initial map that you can then start validating using data – refining your map as necessary. You might also find it useful to create 3 or 4 different audience maps, each splitting your audience in different ways, that are all potentially valid and insightful for your goal.
Once you have your potential audiences the next step would then be to use data to validate the size and value of these audiences. The aim here isn’t to limit our experiments to a specific audience – we’re not looking to do personalisation quite yet. But understanding our audiences means when we come to designing experiments we’ll know how to cater to the objections and concerns of as many users as possible.
Part 4 – Identifying the areas with the greatest opportunity to make an impact
Armed with an better understanding of our audience, we still need to choose when and where to act to be most effective. Areas is about understanding the user journey – and focusing our attention on where we can make the biggest impact.
For each audience, the best time and place to try and influence users will vary. And even within a single audience, the best way to influence user behaviour is going to depend on which stage of their purchase journey the users are at.
As with audiences, we need to map out the important areas. We start by mapping the onsite journeys and funnels. But we don’t limit ourselves to just onsite experience – we need to consider the whole user journey, especially if our goal is something influenced by behaviours that happen offsite. We then need to identify which steps directly impact each of our KPIs. This helps to limit our focus, but also highlights non-obvious areas where there could be value.
As with audiences, you can sketch out the initial map fairly quickly, then use analytics data to start adding more useful insights. Label conversion and drop-off rates to see where abandonment is high. Don’t just do this once for all traffic, do this repeatedly, once for each of the important audiences identified in the previous step. This will highlight where things are similar but crucially where things are different.
So with a good understanding of our audiences and areas we can add these to our framework. Completing these two parts of the framework is easier the more data you have. Start with your best guess at the key audiences and areas, then go out and do your user-research to inform your decisions here. Validate your audiences and areas with quant and qual data.
Part 5 – Identifying the potential levers that influence user behaviour
Levers are the factors we believe can influence user behaviour: the broad themes that we’ll explore in experimentation. At its simplest, they’re the reasons why people convert, and also the reasons why people don’t convert. For example, trust, pricing, urgency and understanding are all common levers.
To identify levers, first we look for any problems that are stopping users from converting on our KPI – we call these barriers to conversion. Some typical barriers are lack of trust, price, missing information and usability problems.
We then look for any factors that positively influence a user’s chances of converting – what we call conversion motivations. Some typical motivations are social proof (reviews), guarantees, USPs of the product/service and savings and discounts.
Together the barriers and motivations give us a set of potential levers that we can “pull” in and experiment to try and influence behaviour. Typically we’ll try to solve a barrier or make a motivation more prominent and compelling.
Your exact levers will be unique to your business. However there are some levers that come up very frequently across different industries that can make for good starting points.
Ecommerce – Price, social proof (reviews), size and fit, returns, delivery cost, delivery methods, product findability, payment methods, checkout usability
Saas – Free trial, understanding product features, plan types, pricing, cancelling at the end of trial, monthly vs annual pricing, user onboarding
Where do levers come from? Data. We conduct user-research and gather quantitative and qualitative data to look for evidence of levers. You can read more about how we do that here.
When first building our framework it’s important to remember that we’re looking for evidence of levers, not conclusive proof. We want to assemble a set of candidate levers that we believe are worth exploring. Our experiments will then validate the levers and give us the “proof” that a specific lever can effectively be used to influence user behaviour.
You might start initially with a large set of potential levers – 8 or 10 even. We need a way to validate levers quickly and reduce this set down to the 3–4 most effective. Luckily we have the perfect tool for that in experiments.
Part 6 – Defining the experiments to test your hypotheses
The final step in our framework is where we define our experiments. This isn’t an exercise we do just once – we don’t define every experiment we could possibly run from the framework at the start – but using our framework we can start to build the hypotheses that our experiments will explore.
At this point, it’s important to make a distinction between a hypothesis for an experiment and the execution of an experiment. A hypothesis is a statement we are looking to prove true or false. A single hypothesis can then be tested through the execution of an experiment – normally a set of defined changes to certain areas for an audience.
We define our hypothesis first before thinking about the best execution of an experiment to test it, as there are many different executions that could test a single hypothesis. At the end of the experiment the first thing we do is use the results to evaluate whether our hypothesis has been proven or disproven. Depending on this, we then evaluate the execution separately to decide whether we can iterate on it – to get even stronger results – or whether we need to re-test the hypothesis using a different execution.
The framework makes it easy to identify the hypothesis statements that we will look to prove or disprove in our experiments. We can build a hypothesis statement from the framework using this simple template
“We believe lever [for audience] [on area] will impact KPI.”
The audience and area here are in square brackets to denote that it’s optional whether we want to specify a single audience and area in our hypothesis. Doing so will give us a much more specific hypothesis to explore, but in a lot of cases we may also be interested in testing the effectiveness of the lever across different audiences and different areas – so may want to not specify the audience an area until we define the execution of the experiment.
Using the framework
Your first draft of the completed framework will have a large number of audiences, areas and levers, and even multiple KPIs. You’re not going to be able to tackle everything at once. A good strategy should have focus. Therefore you need to do two things before you can define a strategy from the framework.
Prioritise KPIs, audiences and areas
We’re going to be publishing a detailed post of how this framework enables an alternative approach to prioritisation than typical experiment prioritisation.
The core idea is that you need to first prioritise the KPI you most need to impact from your framework in order to achieve your goal. Then evaluate your audiences identify those groups that are the highest priority groups to influence if we want to move that KPI. Then for that audience prioritise those areas of the user-journey that offer the greatest opportunity to influence their behaviour.
This then gives you a narrower initial focus. You can return to the other KPIs at a later date and do the same prioritisation exercise for them.
You need to quickly refine your set of levers and identify the ones that have the greatest potential. If you have run experiments before you should look back through each experiment and identify the key lever (or levers) that were tested. You can then give each lever a “win rate” based on how often experiments using that lever have been successful. If you haven’t yet started experimenting, you likely already have an idea of the potential priority order of your levers based on the volume of evidence for each that you found during your user-research.
However, the best way to validate a lever is to run an experiment to test the impact it can have on our KPI. You need a way to do this quickly. You don’t want to invest significant time and effort testing hypotheses around a lever that turns out not have ever been valid. Therefore for each lever you should identify what we call the minimum viable experiment.
You’re probably familiar with the minimum viable product (MVP) concept. In a minimum viable experiment we look to design the simplest experiment we can that will give us a valid signal as to whether a lever works at influencing user behaviour.
If the results of the minimum viable experiment show a positive signal, we can then justify investing further resource on more experiments to validate hypotheses around this lever. If the minimum viable experiment doesn’t give a positive signal, we might then de-prioritise that lever, or remove it completely from our framework. We’ll also be sharing a post soon going into detail on designing minimum viable experiments.
Creating a strategy
How you create a strategy from the framework will depend on how much experimentation you have done before and therefore how confident you are in your levers. If you’re confident in your levers then we’d recommend defining a strategy that lasts for around 3 months and focuses on exploring the impact of 2-3 of your levers on your highest priority KPI. If you’re not confident in your levers, perhaps having not tested them before, then we’d recommend an initial 3-6 month strategy that looks to run the minimum viable experiment on as many levers as possible. This will enable you to validate your levers quickly so that you can take a more narrow strategy later.
Crucially at the end of each strategic period we can return to the overall framework, update and refine it from what we’ve learnt from our experiments, and then define our strategy for the next period.
You can have a first go at creating your framework in about 30 minutes. Then you can spend as long or as little time as you like refining it before you start experimenting. Remember your framework is a living thing that will change and adapt over time as you learn more and get more insight.
Establish the goal of your experimentation project
Define the KPIs that you’ll use to measure success
Understand how your audience impacts your KPIs and goal
Identify the areas with the greatest opportunity to make an impact
Identify the potential levers that influence user behaviour
Define the experiments to test your hypotheses
The most valuable benefit of the framework is that it connects all your experimentation together into a single strategic approach. Experiments are no longer islands, run separately and with little impact on the bigger picture. Using the framework to define your strategy ensures that every experiment is playing a role, no matter how small, in helping you impact those KPIs and achieve your goal.
Alongside this, using a framework also brings a large number of other practical advantages:
It’s clear– your one diagram can explain any aspect of your experimentation strategy to anyone that asks or if you need to report on what you’re doing
It acts as a sense check – any experiment idea that gets put forward can be assessed based on how it fits within the framework. If it doesn’t fit, it’s easy rejection with a clear reason why
It’s easy to come back to – things have a nasty habit of getting in the way of experimentation, but with the framework even if you leave it for a couple of months, it’s easy to come back to it and pick up where you left off
It’s easier to show progress and insight – one of the biggest things teams struggle with is documenting the results of all their experiments and what was learnt. With the framework, the idea is that the framework updates and changes over time so you know that your previous experiment results have all been factored in and you’re doing what you’re doing for a reason
As we said at the start of this post, there is no special sauce in this framework. It’s just taking a logical approach, breaking down the key parts of an experimentation strategy. The framework we use is the result of over 10 years of experience running experimentation and CRO projects and it looks how it does because it’s what works for us. There’s nothing stopping you from creating your own framework from scratch, or taking ours and adapting it to suit your business or how your teams work. The important thing is to have one, and to use it to go from tactical to strategic experimentation.
You can find a blank Google Slide of our framework here that you can use to create your own.
Alternatively you can download printable versions of the framework if you prefer to work on paper. These templates also allow for a lot more audiences, areas, levers and experiments than we can fit in a slide.
At Conversion.com, our team and our clients know first-hand the impact experimentation can have. But we also see all too often the simple mistakes, misconceptions and misinterpretations organisations make that limit the impact, effectiveness and adoption of experimentation.
We wanted to put that right. But we didn’t just want to make another best-practice guide to getting started with CRO or top 10 tips for better experiments. Instead, inspired by the simple elegance of the UK government design principles, we set ourselves the challenge of defining a set of the core experimentation principles.
Our ambition was to create a set of principles that, if followed, should enable anyone to establish experimentation as a problem solving framework for tackling any and all problems their organisation faces. To distill over 10 years of experience in conversion optimisation and experimentation down to a handful of principles that address every common mistake, every common misconception and misinterpretation of what good experimentation looks like.
Many hours of discussion, debate and refinement later, we’re happy to be able to share the end product – the 9 principles of experimentation.
Here are the principles in their simplest form. You can also download a pdf of the experimentation principles that also includes quotes and stories we’ve gathered from experimentation experts at companies such as Just Eat, Booking.com, Microsoft and Facebook. A few snippets of those quotes are included below as a taster.
Experimentation should not be limited to optimising website landing pages, funnels and checkouts. Use experimentation as a tool to challenge the widely held assumptions, ingrained beliefs and doctrine of your organisation. It’s often by challenging these assumptions that you’ll see the biggest returns. Don’t accept “that’s the way it’s always been done” -to do so is to guarantee you’ll get the results you’ve always had. Experimentation provides a level playing field for evaluating competing ideas, scientifically, without the influence of authority or experience.
It sounds trite to say you should start with data. Yet most people still don’t. Gut-feel still dominates decision making and experiments based on gut-feel rarely lead to meaningful impact or insight. Good experimentation starts with using data to identify and understand the problem you’re trying to solve. Gather data as evidence and build a case for the likely causes of those problems. Once you have gathered enough evidence you can start to formulate hypotheses to be proven or disproven through experiments.
3 – Experiment early and often
In any project, look for the earliest opportunity to run an experiment. Don’t wait until you have already built the product/feature to run an experiment, or you’ll find yourself moulding the results to justify the investment or decisions you’ve already made. Experiment often to regularly sense-check your thinking, remove reliance on gut-feel and make better informed decisions.
4 – One, provable hypothesis per experiment
Every experiment needs a single hypothesis. That hypothesis statement should be clear, concise and provable – a cause-effect statement. A single hypothesis ensures the experiment results can be used to evaluate that hypothesis directly. Competing hypotheses introduce uncertainty. If you have multiple hypotheses, separate these into distinct experiments.
5 – Define the success metric and criteria in advance
Define the primary success metric and the success criteria for an experiment at the same time that you define the hypothesis. Doing so will focus your exploration of possible solutions around their ability to impact this metric. Failing to do so will also introduce errors and bias when analysing results—making the data fit your own preconceived ideas or hopes for the outcome.
6 – Start with the minimum viable experiment, then iterate
When tackling complex ideas the temptation can be to design a complex experiment. Instead, look for the simplest way to run an experiment that can validate just one part of the idea: the minimum viable experiment. Run this experiment to quickly get data or insight that either gives the green light to continue to more complex implementations, or flags problems early on. Then iterate and scale to larger experiments with confidence that you’re heading in the right direction.
7 – Evaluate the data, hypothesis, execution and externalities separately
When faced with a negative result, it can be tempting to declare an idea dead-in-the-water and abandon it completely. Instead, evaluate the four components of the experiment separately to understand the true cause:
The data – was it correctly interpreted?
The hypothesis – has it actually been proven or disproven?
The execution – was our chosen solution the most effective?
External factors – has something skewed the data?
An iteration with a slightly different hypothesis, or an alternative execution could end in very different results. Evaluating against these four areas separately, for both negative and positive results, gives four areas on which you can iterate and gain deeper insight.
8 – Measure the value of experimentation in impact and insight
The ultimate judge of the value of an experimentation programme are the impact it delivers and the insight it uncovers. Experimentation can only be judged a failure if it doesn’t give us any new insight that we didn’t have before. Negative results that give us new insight can often be more valuable than positive results that we don’t understand.
9 – Use statistical significance to minimise risk
Use measures of statistical significance when analysing experiments to manage the risk of making incorrect decisions. Achieving 95% statistical significance leaves a 1 in 20 chance of a false positive – seeing a signal where there is no signal. This might not be acceptable for a very high risk experiment with something like product or pricing strategy, so increase your requirements to suit your appetite. Beware experimenting without statistical significance, that’s not much better than guessing.
These are the 9 principles we felt most strongly define experimentation, but no doubt we could have added others and made a longer list. If you have experimentation principles that you use at your organisation that we haven’t included here we’d be interested to hear about them and why you feel they’re important.
We’re also looking for more stories and anecdotes of both good and bad examples of these principles in action from contributors outside Conversion to include in our further iterations of these principles. If you have something you feel epitomises one of these principles then please get in touch and you could feature in our future posts and content about these principles.
And finally, if you want to be notified when we publish more content about these experimentation principles, drop us an email with your contact details.
The ‘year of personalisation’ has been on the cards for a while now.
A quick Google search, and you’ll find plenty of articles touting 201X as the year that personalisation will take off. Midway through 2017, we’re still waiting for it to really take hold.
So, what’s holding personalisation back from becoming the norm? Why isn’t every website already perfectly tailored to my individual needs?
There are two main reasons we have yet to see personalisation live up to the great expectations.
The first reason is the expectation itself. The dream of personalisation as it’s sold – a website responsive to the user’s habits – will likely remain just that for all but a handful of organisations that meet very challenging criteria.
The rest of us have more realistic and practical expectations for personalisation where it must prove its worth against many other activities competing for resources.
The second reason is that implementing personalisation is a difficult process, and one where it makes sense to start small and build up. No doubt the majority of organisations are starting to explore personalisation, but the reason we feel it has yet to take off is because they are still in the early stages. Personalisation is hard. It’s not something that can be undertaken lightly and from a conversion optimisation perspective, is only possible if you have already reached the higher levels of experimentation maturity.
So, how do I know if my business is ready?
Before even thinking about technical capabilities, tools or technology, you should evaluate personalisation in three areas: suitability, profitability and maturity.
Certain types of operating model are better suited to personalisation and offer more opportunity and potential. If you maintain an ongoing relationship with your customers and see a high frequency of engagement e.g. if you get a lot of repeat transactions as an e-commerce site, then personalisation is likely to be more suitable. In general, the greater the frequency of your customers’ visits, the more relevant any previous data about that customer is likely to be, and the experience you create for that customer can be more relevant as a consequence.
On the other hand, for websites that focus on a single engagement, where repeated engagement is unlikely or infrequent, personalisation is likely to be far less effective. Those organisations are likely to have limited data about the user and, consequently, it will be more difficult to create highly relevant experiences. Depending on what model your organisation operates, you might decide that your website is more or less suited to personalisation.
Implementing and maintaining personalisation comes with considerable costs. You should only invest in personalisation if you can demonstrate that the benefits will outweigh the costs involved. The underlying hypothesis of personalisation is that delivering a more relevant experience to a user will increase the likelihood of them converting. As with all hypotheses, this should be tested and validated. Experimentation and testing will allow you to prove the value of personalisation for your business, so that is where you should start.
Personalisation requires a deep understanding of user behaviour. More so than in A/B testing, we need to understand not just why users aren’t converting, but also how segments of users vary in their motivation, ability and trigger. If your organisation is still at the lower levels of experimentation and conversion optimisation maturity, then it will be difficult to implement personalisation experimentation in a way that is effective and manageable. A good way to think of it is as a higher level of experimentation maturity that you should explore once you have exhausted the gains that could be had from general experimentation and conversion rate optimisation.
What is a realistic expectation for personalisation for my business?
It doesn’t have to be the 1-to-1 highly granular customisation that people tend to think it is. There are many different ways to approach personalisation and the approach that is best for your business will depend on a number of factors.
In order to start your discussions about personalisation, here are a few different types that you may want to explore:
Behaviour-based personalisation – This is a great place to start as it has a low barrier to entry. Generally, this type of personalisation is based on the user’s behaviour on the site during their current session. Altering the content that you show the user when they return to the homepage based on what type of pages they have visited in this session, (or multiple sessions using cookies), for example.
Context-based personalisation – This is where the user’s experience is personalised based on the context of their visit to the site. A basic example of this is personalising landing pages based on the user’s PPC search term, the email they clicked through, or the display ad they’ve clicked. This is more commonly known as segmentation, but really this is just another type of personalisation. This can be a good step towards defining the important audiences/segments that would then feature in more advanced personalisation.
Attribute-based personalisation – This is what most people think of when they think about personalisation: using prior knowledge or attributes about a user to personalise their experience. This type generally requires more advanced technology to connect sources of data about a user together in a way that creates what’s known as a Dynamic Customer Profile for each user. This profile will contain all the possible attributes around which an experience can be personalised to that specific user.
User-led personalisation – Not all personalisation has to be invisible to the user. In fact, it could be argued that personalisation is more effective when the user can see it happening and is aware that the site is being customised to them. Netflix users know that movie recommendations are based on what they’ve already watched, just as Amazon’s product recommendations are based on what you’ve previously shown an interest in purchasing. This feels more compelling than if you were just shown recommended products without reason.
Personalisation via predictive modelling – This is the realm of AI and machine learning, where models can be used to assign a user to the best guess ‘lookalike’ audience based on their first few actions on the site. For example, users that visit the ‘Sale’ section of a site within the first three clicks could be assumed to fit in a ‘bargain hunter’ audience. Then any previous learnings about how to effectively convert bargain hunters could be applied to personalise the experience for this user.
So, will 2018 finally be the ‘year of personalisation’?
I expect we will see a lot more case studies emerging of personalisation proving successful as more organisations start seeing the rewards of their investment in this area. If nothing else, I’d expect 2018 to be the year that organisations individually make their decision whether to invest or not invest in personalisation in a serious way.
Personalisation isn’t going to be suitable for everyone. The dream of 1-to-1 personalisation that runs itself might remain a dream for the majority, but taking the first steps towards investigating its potential is an exercise that every organisation should undertake. As preparation, plotting your current position on our experimentation maturity model will help you to plan the steps you need to take to be ready when the time comes.
There are many ways you could attempt to measure conversion optimisation and experimentation maturity. At Conversion.com we work with businesses and teams at all levels of conversion maturity – from businesses just starting out with conversion optimisation that have never launched a test, to businesses with growth and optimisation teams of hundreds of people. From this experience we’ve built up a good picture of what defines maturity.
Our model for maturity focuses on measuring strategic maturity. We believe conversion optimisation maturity shouldn’t be limited by the size of your organisation, team or budget. Any organisation, armed with an understanding of what maturity looks like, where they are currently and what level they would like to reach, should be able to reach the higher stages of experimentation maturity.
For this reason our model does not include basic measures of scale such as number of tests launched per month or size of experimentation team. Nor does it refer to any specific tools or pieces of technology as requirements. In defining this model we wanted to keep things simple. To create a model for maturity that helps to start conversations both with our clients and in any team serious about putting experimentation at the heart of their business.
Our model measures maturity against three key scales: experimentation goals, experimentation strategy and data and technology.
What are the goals of your optimisation programme? If you’re just starting to explore experimentation and conversion optimisation you might have the goal for your programme of simply getting a test live. At the other end of the spectrum, more businesses are emerging now where the goal of experimentation is to be a driving force in the overall strategy of the business. The goals that we set for experimentation in our organisations,and our ambition in this area set the tone for how we approach and deliver experimentation. Organisations that have embraced experimentation set more ambitious goals and these goals require a more mature approach to achieve them. That’s why evaluating the goals for experimentation within your own organisation is the best place to start when evaluating your place on the maturity scale.
Developing your maturity in this area involves shifting the scope of your goals and developing alignment of the goals of experimentation with the overall goals of your business. Moving from goals being about short-term results and impact on KPIs, towards being about answering business questions and informing business decisions and strategy.
It’s important to make a distinction between reality and ambition when trying to plot your current position in this scale. Consider the role that experimentation currently plays in your organisation and how you are currently setting the goals, rather than what you’d like to be your goal for experimentation in an ideal world. The maturity model is most useful as a tool for assessing where you are now, where you want to be in the future, and what needs to change to close the gap between the two.
Where does your strategy for experimentation come from? Experimentation goals and experimentation strategy are closely linked, with strategy being how you achieve the goals you’ve set. If you are just starting to explore experimentation, you may not have thought too much yet about an overall strategy. Early on, experimentation strategy tends to be largely tactical in nature, with ideas generated on an ad-hoc basis and experiment prioritisation based on most urgent priority or a simple impact/ease model. Each experiment is treated as an individual exercise.
Advanced optimisation teams plan their strategy for achieving their optimisation goals across both the short-term and long-term. Long-term strategic planning should focus on prioritisation at the high level of goals and priorities. Conversion optimisation is an ongoing process. It’s not possible to do everything at once, and mature teams plan and prioritise the areas that they will focus on right now and those that they’ll focus on later in the year. In this way they can keep their focus narrow and ensure there is a clear plan for achieving their goals.
Advanced optimisation teams view testing not as a tool for increasing conversion rates but as a tool for answering questions. Starting with the big picture, they identify the business questions that need to be answered. They then break these problems down to define the tests and research that they need to complete to validate their hypotheses and answer that question.
As we move up the maturity stages, optimisation strategy becomes more thematic. Experiments are considered now as one tool for exploring a specific theme or conversion lever. At this level, experimentation is organised as a series of projects, each made up of a combination of targeted user-research pieces and experiments. These projects align to business strategy, and experimentation starts to play a leading role in overall business strategy.
Data & technology strategy
How do you detect and measure the things that matter? The quality of insight gained from experimentation is directly correlated to the quality of data that you collect about what happened. If your goal is just to get some experiments live there is probably less emphasis on ensuring those experiments have a solid grounding in data. Ensuring the data the experiments produce when they do run is reliable and actionable can often be more of an afterthought. Advanced optimisation teams will be a lot more deliberate, with data and insight playing leading roles in generating test hypotheses, and experiment data being a valuable source of insight for the business and the people in it. Maturity here is being confident in your data so that you can challenge it, ask probing questions of experiment impact, and be able to confidently produce the answers.
Technology plays a key role in this, but is only as good as the strategy for using it. The specific tools you use aren’t as important, for example, as your ability to connect your tools and data sets together. A set of simple but connected tools can deliver greater quality of insight that one advanced but isolated tool. Start with your experimentation tool, and connect it to any other tools you have such as surveys, session recording and heatmaps. In particular, connect it to your back-end reporting systems so that the impact of experiments can be measured against the KPIs that really matter, and that people look at on a daily basis.
Maturity levels and where you place
Now that we’ve explored the 3 scales that we use to measure maturity we can define approximate levels of maturity to give us an overall scale and tool for evaluating our own place. Really though, maturity is a continuous scale rather than something discreetly split into levels. When reviewing the levels below you may place yourself at different levels for each of the 3 scales. This is very common. There is often one part of our approach that we know is probably holding us back – a weak link in the chain. This model should help formalise and pinpoint that weakness and start the conversations for how to overcome it.
If you’re looking to develop the maturity of your experimentation and conversion optimisation strategy then we’d be happy to help. Just drop an email to firstname.lastname@example.org we’ll organise a free maturity consultation with one of our team.
In 2009 Ruben started his own company, BidSketch. At its inception BidSketch was a proposal software that was primarily targeted at web designers and web developers. Starting as a one-man show, BidSketch has vastly grown, to this date it helped its customers to make more than $1 billion in sales.
But Ruben’s journey was not an easy one. On the way to his first $1000 there were multiple times when he wanted to give up on the whole idea completely. His initial research showed that the no one had any interest in the product, he wasted a whole month building a free tool that nobody used and even missed his launch date due to unreliable contractors (more on that here).
In fact, he even had to hit the $1000/month mark twice(!). One API call eroded almost all the billing information he had about his customers. In a matter of seconds, his revenue dropped down to zero. He had to email his customers, asking them to reset their paying accounts again. Surely, a fair number of customers did not return.
The upside is – throughout his journey Ruben has learnt a lot – and that’s why I am so excited to have had the opportunity to interview him. In 2012 he wrote a blog post, “What I learned from increasing my prices”, where he explains how research and testing allowed BidSketch to see one of the largest spikes in growth it has ever had. Since then his pricing page has evolved even further and that’s what we are about to dig into.
We cover pricing, small tests that he ran that resulted in substantial increases in conversion rate (and revenue), how Ruben used Jobs-to-be-Done interviews to decrease his customers’ churn rate, research and testing tools that helped him on his journey, and so much more.
I recommend you read his original article first (although it’s not required). I’ve learnt a lot and I am sure you will too.
Part 1: How to communicate value of your product on the pricing page – while keeping things simple
A little bit of background history.
Here’s the first version of BidSketch pricing page (2010)
Here’s the version that resulted in one of the largest spikes in revenue (the one he talks about in his article). This is 2012.
This is the version that we see today (in 2016)
Egor: First of all, I would like to understand the context behind your pricing page, how it evolved over the years, and then get into the nitty-gritty of what research questions did you find being most useful, any actionable tips you can share, things that delivered the most results.
The major difference that I can see is that you had freelancer, studio and agency plans. Then, today in 2016 it is split into Solo, Team and Business. How did that change happen? The first one seemed to be more tailored to customer personas (web designers in particular), and this one seems to be more generic – more applicable to everyone. Did you change it as you scaled or was there another reason for that?
Ruben: Initially,whenwe had the premium and basic plans (when BidSketch first launched), it was for designers. By the time I did this pricing change, it was no longer for designers, but it still was for… you know, creatives. I think at the time 80% or 90% were the categories of either web designers, marketing, freelancers, SEO, developers, people from companies in those categories. Persona-based pricing was a good fit for that.
Then, there was a point where we started getting more customers as we scaled, and that distribution started to change. We saw that it started to change through a few surveys, but beyond that, we also started to see it in cancellation feedback of people who were entering the trial period. More and more people were saying, ‘I don’t think this is for me. I don’t feel like it was made for my business. It seems as if it was made for designers or web developers’.
We started changing the product, for example, addingmoretemplatesformore businesses. That way we had a bunch of different signals in the app that spoke to those kinds of businesses. Then we started generalizing even more andaddingmoreresourcesto appeal to them.
But we were still getting that the last piece was the pricing page. Welookedatthebusinesses that were cancelling, at their websites and we talked to them. With some of them we did Jobs-to-be-Done interviews. It was like, ok, the pricing might be unclear when somebody goes to the pricing page and they see freelancer, studio, agency, and they are not that type of company.
For example, they could be from a SaaS company doing enterprise sales, and they would think, ‘hmm, this is not quite right’. So, we did a test to see if there would be an impact on conversions. In the previous test [the change that was carried out in 2012, see images above] where we changed general names [Basic and Premium] to freelancer, agency and studio plan names, we got more customers. This time around when we tested Business, Team and Solo, we got less trials, which was interesting, but we got a little bit more customers at a bit of a higher price point.
Egor: That’s very interesting. First of all, you targeted specific segments or even identities, and achieved an uplift, and then you repositioned it… with an appeal to broader audiences. It seems like the opposite of the technique that worked for you in the first place led to more people being closed.
Ruben: Right, you know. Business changes, market changes, competition, traffic you get, there are a lot of variables. It’s a good idea to retest, I do that sometimes – retest things that did not work before.
Egor: Another change that I can see is your plans are primarily limited by the number of users; and previously the limitations included proposals, clients and users [and storage].
The limitations you set for your plans are important, aren’t they?
They can act as an incentive for a client to upgrade.
The extent to which your product and its different features are used also affects your cost base. For example, if the number of proposals that a customer can set closely correlates with your costs, and you make it unlimited, then your cost base could skyrocket [if customers start creating loads of proposals].
Probably, this is not the case given the fact that you removed it, but I am just trying to understand what was your thinking behind setting some of these limitations, for example, users, and removing other ones (proposals and clients)? Is it primarily customer-research driven? Was it somehow affected by your consideration of costs and profit? What was your thinking process?
Ruben: It was based off of a couple of things. One was we looked at the data when we had very simple plans, either a plan with one user or a plan with unlimited users [the very first plans Bidsketch had in 2010]. Looking at the data, we could see very clear groupings or break-points. They were not getting charged for those extra users, so we could see naturally how many people on one account used the product.
We’ve seen a bunch of companies with two or three users. Then, we’d see, I think the next point was 5, and the next one 8. Just based off of that data, it felt like a really good test. We also looked at different types of companies that had these different numbers of users. That was one of the things that we looked at, and the other thing was features.
Basically, since we were just leveraging users [as a limitation], we mainly looked at customising domains and team management [for different plans]. Team management does not really mean anything for people who are on the 1 user plan, but it’s there to make it feel a lot more different, like you’re getting a lot more value on a higher priced plan where you have more users. We could probably eliminate that row, and it would still be clear what the differences are [between these plans]. The reason why it’s there is to make it feel more different, it’s something to make it stand out more.
Overall, we used a combination of metrics and qualitative data. One limitation, users, was based off of our quant data. Ability to customise your own domain is something that was highly valued based off of our conversations with customers. We tried to do both; we looked at the data that we have and we tried to have conversations with customers to get clarity on that data, to make sure that what we think we are seeing is actually what we are seeing. That was the thinking behind it.
The proposals… I am trying to remember why [we had in the first place], I think the proposals was an attempt to have something else that we use to push people towards the $29/month plan. That’s why we did it and when we had this plan, most people signed up to the $29/month plan. Most people did not sign up to the $19/month plan although it was cheaper.
So, I don’t think I ever really tested that before [specifically testing impact of proposals]. At one point I wanted to simplify pricing. So, we ran surveys and asked people what confused them. There were few things that would come up in [the surveys], but one thing that I just wondered about was is, ‘Is the number of proposals actually doing anything?’.
Hiten Shah from KISSmetrics, CrazyEgg, recommends sometimes doing what he calls sensitivity testing, which is just: remove something from the page, see if it’s actually working. Instead of adding something or changing it, just take it off, see if it actually has any impact. So, we did that and you know it did not get any worse, it did not get any better. So, I dropped it, just because I like simple, simple is better.
Egor: Was is the same for the ‘clients’ limitation?
Ruben: Well, we dropped the freelancer plan (the $19 plan) out of the main grid to add another plan. So, clients is a metric that we still limit on, but not on any of the plans that are on the grid. It’s limited on the link below the plan. Since the other plans on the grid are more expensive and we don’t limit clients on any of them, there is no need to have that.
Egor: Ah, I saw that. There is a link below that takes you to another plan. I read this case study where Joanna Wiebe from CopyHackers optimised CrazyEgg’s website, I think they did a similar sensitivity test. They removed the Johnson’s box on the left which is a navigation box, and I think they removed it in order to… basically, to make more space, so that they can put more content above-the-fold on that landing page.
You said that you tried to simplify the plans. As far as I understand, BidSketch has many more features than what is currently listed on the pricing page.
How did you… I am asking this because usually when I look at enterprise SaaS at least, they have a huge list of different features. It just falls on you and sometimes I start feeling overwhelmed.
You can sense that the one with a larger list is meant to be more attractive for larger businesses, but to really find something for yourself… it’s hard, sometimes I can’t even make it through.
So, my question is: How did you come to that list of features that is currently listed on your pricing page? Your plans look very simplified and easy-to-digest.
Ruben: There was mainly… I think when we were working on the second version of these plans, we tested just having a bunch of features on the left hand-side, having them listed all out, more detailed [like the SurveyGizmo example above], and the simpler version won; it did better. So, that’s what moved us in that direction.
We also did some Qualaroo surveys. We found that yes, there can be value from showing what are the features on each of these plans, even if the feature does not communicate what are the differences between each of these plans, it is still valuable for users to know.
Even if you pushed everyone to see your tour page before they see the pricing page, not everyone will actively engage with your tour page. They might just skip to pricing, this is why I think it’s important to show important features that are available on all the plans. But it does not have to be done in a way that a lot of people do it, which is a column on a left hand side, and on the right there is a pricing grid.
We are doing it at the bottom before the sign-up button where we say, “All plans include templates, branding, and PDF export”.
Also, we don’t [show] all of the features that we have, we only have those features that are most important to people. The ones that we know from interviews and surveys, they are the most important things because they asked for them or whatever. So, it’s sort of still limited, but it’s shown there. And when you have them on the left hand side, it’s just more energy, it adds more visual noise, it makes it harder to parse through the pricing grid.
Part 2: How to apply Jobs-to-be-Done interviews to SaaS and finally ‘get’ your customers, build a better product and cut down churn by over 30%?
Egor: So, to simplify the plan you needed to limit the number of features you are showing. To do this, you did research and identified what customers found being most valuable in your product, what research and what types of questions. You said you used Qualaroo surveys, you used interviews – what exact questions did you find most useful when trying to understand what your customers find being most valuable?
Ruben: It’s two things. It’s seeing what they are using when they pay, and it’s never been directly asking them, but finding out what they chose or why they chose it, for example, when they upgraded or decided to pay. This came up through Jobs-to-be-Done interviews where we did ‘switch interviews.’
In those you focus on what happened; the steps that they took when they stopped using whatever it is that they were using previously and started paying for our product. In that, there is a point where they are evaluating and they are deciding and it’s pretty clear…
You ask them, ‘What did you do next? What you didn’t do next? Why did you do that? Ok, what was you thinking at this point? Did you have any concerns?’ The thing that generally comes out is the decision that they were making, the trade-offs that they were making when they were buying, so then you get to see, ‘Aha!’
So, to them the branding part is not really that important because that did not stop their decision, that did not stop them from upgrading, but they were not sure about custom domains, so in their trial they did not upgrade or did not pay or did not start their plan early – even though they wanted to – until they set up DNS and set up their custom domain, etc’.
So, there are a bunch of little stories like that, so that we can then see, ok, these were the themes that helped them to decide to pay and these are the ones that did not. So, again, we used a combination of that [qualitative data, specifically JTBD interviews] and quantitative data.
Egor: You mentioned switch interviews and Jobs-to-be-Done interviews. I have heard of Jobs-to-Done as a concept [when I read Clayton Christensen’s “How will you measure your life?”], but I have not heard of Jobs-to-be-Done interviews. Is it a standardised set of questions you use, do you prepare it yourself, is it some type of framework? Could you explain it to me?
Ruben: Yeah, sure. Generally we run switch interviews. It’s about capturing the story of the switching moment. So, instead of asking them, “Why did you sign up? How did you like it?”, or any things like that, you approach it in a different way.
Basically, people often don’t know on the surface why [they made a particular decision]. Or they would give you reasons that they think you want to hear, but instead with switch interviews you start by asking…
Well, you start in a lot of different ways, but the framework for asking these questions is to find out:
what they were using before
when did they start to have problems or doubts with the things they were using
why did they start looking for something else
why did they start to evaluate something else
why did they start to evaluate it or sign up for it at that moment, on that day instead of the day before, the day after, to really dig into it
You want to have them walk you through every step of what happened in order to understand their thinking, their process and ask, ‘What were you thinking here? Why did you do this? Why did you do that?’ instead of asking, ‘Why did you sign up?’. And going through that story where you are finding out what the moment when they decided to buy – through their actions – was, and what their thinking was.
[My note: Notice how the approach above is different from standard CRO questions such as, “What persuaded you to purchase from us today?”.
For those unfamiliar with JTBD framework, think about what Ruben said before: often customers do not know the deep reasons behind why they signed up. So, often if you just ask, ‘Why did you sign up?’, you will get a lot of surface answers. Eg. ‘I just needed to create proposals for my business.’ This is not very actionable.
Instead, with JTBD interviews you go through their story and ask them why they made certain decisions in the past that ultimately led to the final purchase decision. When people go back in time and start recalling situations and context in which these decisions were made, more detailed memories start coming out on the surface and the real motives behind one’s purchase are revealed.
It didn’t click with me until I read Alan Klement’s book “When Coffee and Kale compete” and tried conducting a JTBD interview myself, but the quickest way to get to your first “aha” moment with JTBD framework is to listen to the JTBD Mattress interview].
Egor: So, when does it happen? Does it happen straight after someone converted into purchase or can it happen at any time?
Ruben: Well, it’s a SaaS product, there are two things. There’s a ton of friction when we ask for a credit card upfront for someone to sign up for a trial. So, that’s one thing. There has to be enough… enough momentum and something pushing them towards entering their credit card information to do that rightatthatmoment. That’s one point and the other more important point is when they actually decided to buy.
Since that’s a SaaS product where we just bill them automatically on day 14, it’s not on day 14 that they decided to buy. Maybe they forgot to cancel, so a month later they’re going to ask for a refund. Maybe they haven’t even set it up yet. It’s like, ‘yeah, in a few months we will’.
Usually, it’s at some point during the trial or some point after they started paying, after the trial. We often cover that with a question, ‘At what point did you know it was going to work for you?’. We walked through the whole story and ‘yeah, it’s working, we used it and it was really good’. It’s like, ok, good, at what point did you realise that it was ok, before that point you were trying it out, trying to see and then at some point something happened where you saw something and you thought, ‘Yes, this is gonna work’. That’s the buying moment.
Egor: So, you are trying to get them to narrate a story about themselves as opposed to trying to make them rationalise why they made that purchase. And then you try to understand why they made that purchase by listening to their story and analysing it yourself rather than making them to rationalise it for you. That’s very interesting. Did someone create switch interviews? Where did it originate from?
Ruben: Yeah, two guys from the Re-Wired Group that work closely with Clayton Christensen on implementing Jobs-to-be-Done interviews. Bob Moesta and Chris Spiek. They do these interviews with really big clients. They put on these Switch Workshops where they teach this concept.
So, we have also done cancellation interviews where people are switching away from our product to something else. Our product is the thing that they were using and they had a problem with, and eventually people started using something else.
Egor: When you say interviews, do you mean calling and talking through their story? What is the set up like?
Ruben: Yeah, these are like 30-45 minute interviews.
Egor: Is it difficult to recruit people for these interviews?
Ruben: If it’s for people who are paying, we are trying to do the switch interviews for people who paid at least once or they just finished payment for the next month. We do it there because we still want it to be fresh in their mind. We also want to make sure that they are paying [ie. they did not just forget to cancel].
Egor: Is it difficult to get people to agree to these interviews? Do you use some type of incentive? What kind of email do you send?
Ruben: We have not had too much luck recruiting through email. So, generally we do not do that. We previously recruited through Qualaroo surveys inside the app or using Intercom inside the app, taking them through a survey and getting them an incentive.
Recruiting for people who cancelled is much harder than people who just paid for your product, especially when you want to get them on the phone for that long. So, for people that cancelled we did a cancel confirmation page, it came up with a message, saying that they have been cancelled, sorry to see them go, feedback is very important to us, please, help us improve, asking them if they would be willing to participate in a 30-45 minute interview.
To show our appreciation, we’ll toss in a $100 Amazon gift card. It can work without the gift card, we have done that, with the gift card it’s just so much faster. We have a really big incentive. You generally need around 10 to 15 of these interviews. As you don’t need a lot of interviews, it’s well worth for us to give $100 per person for the data that we would get.
Egor: That’s amazing! So, based on what I have heard so far, there are two types, switch interviews and cancellation interviews. What did you find most valuable? With switch interviews you are trying to understand what happened in someone’s life and led them to start paying for your product. With cancellation interviews, are you trying to understand why your value proposition suffers? What’s the main value of these interviews?
Ruben: We have exit surveys on the cancel form. It’s a required form where they tell us why they are cancelling. The vast majority of people are just saying, ‘Did not use it enough’, so there is a percentage of people who say, ‘Well, this did not work or that did not work’. People that cancel in the early months, first month after paying or the first 2 months after paying, it’s generally onboarding stuff. They just did not finish setting up their account, they did not fully implement it or they just did it once, things like that. It’s still a symptom, but the reason for that varies. And people that cancel that have been using it for a while, they tend to be in different categories.
So, the cancellation interviews were to get more insight into what we were seeing as far as the feedback that they were giving us. It felt kind of superficial. It was light, it was better than nothing in these cancel forms, but we wanted to see what the stories were behind that. In particular, the biggest category was ‘not using it enough’.
What do you mean by ‘I am not using it enough’? Why not? It’s not just not using it. There was a reason for it. In some cases, there was a big disconnect between what they expected and what they got.
Another thing that came up was about the term ‘proposal’. There was a disconnect to what they understood as proposals and what the app offered them. It’s a proposal app and once they sign up, they have proposals in their mind to create and send. Then, they start using it and they think, ‘Ok, that thing was more thorough than what I currently use’, it has a lot of features for the proposals that I send. Proposals that I send are very simple.’
Well, taking a look at their “proposal”, it is not really a proposal, they are sending an estimate or they are sending a contract, but for a lot of people these are their sales proposals. These people were less likely to buy. So, as a result of these cancellation interviews we set up examples and help documentation around those other types of documents.
There are several categories and things like that. Sometimes it’s a setup thing, it’s just onboarding, if it’s onboarding, then you can fix it. But it’s much easier to uncover what those reasons are after doing interviews that way.
Egor: I see. Did you make any other product changes or marketing changes that came as a result of these interviews? And did you see any tangible results from these changes?
Ruben: Some of the pricing grid changes that we have already talked about. Through those interviews, that’s where we got the insight. Knowing what features we want to show on that page, on the left hand side or just at the bottom, and which features should we not even bother showing. A lot of that insight came from that. As far as pricing…
Egor: It does not necessarily have to be about pricing. Anything related to product or marketing…
Ruben: The ‘pause’ feature for their account, where they pay $5 a month is actually used and people come back and un-pause their account and start paying again.
Egor: Is it for people who are not using it actively, but want to stay?
Ruben: Right, with people who were cancelling it was kind of streaky. Especially if they are smaller, they would send out some proposals, then they would get something. They’d be busy with that project for several months and would not be using BidSketch. Then, we would bill them and we would bill them again. They would think, ‘I need to cancel, I am not using this’.
Then, in 2 more months they would start using it again. So, they would sign up for another trial and create another account and would not have the past history or anything like that. So, they would like to have had all their past history and not have to set everything up again. Just implementing that was a pretty good thing that came from that. It worked.
It’s used in the way that it was meant to be used. We monitored it, and we worried that people would just leave it there and not come back, but a lot of people did come back. So, that’s working well.
The other thing was yearly plans. Being more aggressive with yearly because of that cycle. This is another thing that came from Jobs-to-be-Done interviews. Being able to change the evaluation period in their mind.
When someone is on a yearly plan, it’s about, ‘How much did I use it this year?’ It’s a very different question from the one you ask yourself every month, ‘Did I use it? Oh no, this month I did not.’ So, maybe I used it 20 times in a year, but it was all in 3 months or 4 months cycles throughout the year. The rest of the months were not used at all.
For somebody who is evaluating on the yearly basis that works. For somebody who is evaluating monthly, sometimes it’s worth it if they think about it in terms of their entire usage per year, but a lot of people don’t think that way. People literally think, ‘Oh, this is a second month I have not used it’.
Egor: So, what did you do? Did you just literally push more people on the yearly plans as opposed to offering monthly plans?
Ruben: Yeah, and pitching yearly plans through Intercom at day 45 or something like that, and giving link for an upgrade with a big discount to invoices. Basically, pitching them everywhere.
Half of the traffic that we get to the pricing page gets defaulted to yearly plans, and the other half to monthly with the option to pay upfront yearly. It’s a little ghetto, but it gives us the right amount of yearly paid accounts without sacrificing too much of the monthly revenue.
Egor: Is it easy to convince people? Do you convince them with just a discount or do you build a bigger business case around it?
Ruben: Well… The discount does most of the work. Just having a generous discount, then pitching it at the right time for the people that do not default to it or initially take it. Some people don’t even know if this is going to work. They don’t feel secure enough with going for something yearly. That’s why… I found that about a 45 day mark is a good time for us to do that.
Egor: How did you come to that 45 day mark? Was it through experimentation/trial-and-error?
Ruben: Through a lot of conversations that we had, we could tell that by then, not everyone, there are many people that are still unsure, but most people would surely love it and know if it’s going to work for them or not.
Egor: How did you come to your current discount? If I am correct, it’s 40%.
Ruben: 40% is for the middle plan, 26% percent for the other plans.
Egor: How did you come to this?
Ruben: We tested discounts. We started maybe at 10% or so, I don’t remember exactly what they were.
Egor: So, you started with discounts and then you looked at how many people would get into a yearly plan? Was is the main KPI for that one?
Egor: Ok, and then you just went up and up with your discount and looked at what the effect would be?
Ruben: That’s right.
Egor: I want to come back to the pause feature. There has been a number of times when I would have certainly paid a small fee. I think it’s very smart…
Ruben: It’s something that I think a lot of SaaS products could do. I’ve seen it done with a pre-pause, I don’t remember the exact products. I wanted to do a ‘pay’ one because we don’t want to pause just a bunch of accounts where people had no intention of coming back. So, it was just that if they are willing to pay at least $5/month, they see that there is real value in this for them. [In that case], they would be more likely to then un-pause it at some point.
Egor: And do a lot of people come back?
Egor: So, it works.
Ruben: It seems to be working for us.
Egor:How does it work? When someone cancels Bidsketch, does their account get deleted straightaway, so they have to create a new one? What is the process? Is it not being saved anyways? What is the incentive for people to pause?
Ruben: If they were not to pause, if they were to cancel, then all their data gets deleted. If they were to come back, they would have to create a new account and recreate everything.
Egor: Are they being notified of that in advance? If I am cancelling, am I being told that all the data will be deleted?
Ruben: Yes, when people cancel, we explain to them that their data is going to be deleted. We make them to tick an extra check-box. We explicitly prompt them during the cancellation flow, so they can choose to pause instead of cancelling.
Egor: I want to clarify the thing about these interview. You seem to have mentioned 3 types of interviews. There are Jobs-to-be-done interviews, switch interviews and cancellation interviews. Are these all separate types?
Ruben: It’s the same type. I just cycle through, but I would say there are 2 types: ‘switching to’ and ‘switching away from’ interviews. We have also have done a lot of regular customer development interviews.
Egor: And what exactly do you mean by that?
Ruben: Just interviews that are generally shorter and are more direct. And we are not capturing their story about why they switched or not. It’s for people who have been already using the product, and when we are usually trying to get more insights around some data that we have collected somewhere or we are trying to get clarification around something.
We ask very specific questions about, for instance, the proposal thing, the term. What sort of documents, what are they sending through BidSketch, what are these documents, what do they contain, what do they have, are these documents to close a sale? Are they being sent through Bidsketch? Or are they being sent through email or other apps? This is an example of us trying to get more data through short custdev interviews, asking very direct questions.
Egor: So, with switch and cancellation interviews, you are trying to understand the Jobs-to-be-Done. With regular ones, you are just trying to clarify any questions you have about a certain aspect of your existing data.
Egor: And with Jobs-to-be-done what questions did you find the most useful?]
I actually have them in my blog post. There is a section in there, a cheat sheet with all the questions.
Part 3: What experiments did Ruben run on the pricing page? How did a quick copy change help him to increase the trial sign up rate? What tools does he use for tracking and testing?
Egor: Coming back to the original re-design of your pricing page, you said you looked at the data. How did you look it up? Did you use any tools or did you just have it in your back-end?
Ruben: Both our back-end and Kissmetrics.
Egor: And what did you use for experimentation, for A/B testing?
Ruben: It was a combination of Optimizely and Kissmetrics.
Ruben: Optimizely is good for re-directing traffic and seeing the results on that page that we are testing, and then we use Kissmetrics to see the impact throughout the funnel, on sign-ups, cancellations, etc.
Egor: So, tracking long-term effects.
Ruben: To make sure that ‘yes, it helped our conversions’, it also did not negatively impact cancellations or something else.
Egor: Now I want to dissect your current pricing page. As you can see, I numbered every element of your current pricing page.
A couple of things are going here that I find interesting. First thing you do is communicate your value proposition in the headline. Then, you seem to communicate not just the value of your product, but value of the free trial itself. I looked at SumoMe’s pricing page today and they did not have any of those elements. How did you come to that?
Ruben: Number one used to be number two based off of some other page, I think it was Basecamp or something similar. At that time, I did not do a lot of testing around ‘get started in less than a minute’ or ‘get started quickly’. That seemed like a good idea, I had that on there, and I wanted to test something different than that. Basically, just to test the value proposition. So, we tested that and it did a little bit better. So, we kept that and I did not have number two at all.
There were questions that we asked through Qualaroo surveys. Asking people what’s stopping them from signing up to where it made me want to test number 2 underneath. It helped a little bit.
We did not see really huge jumps in trials, but most of them were just a little bit better, and so we left number 2. I was actually kind of surprised with number 2, I tested it, but I remember thinking, ‘yeah, it probably won’t do anything, but I just can’t think of anything better’, but in the test it actually worked. I thought, ‘Ha! They are reading that and it actually makes a difference to them!’
Egor: Was impact just on the free trial sign-ups or did the impact translate into actual sales?
Ruben: Yeah, it did! And the order of the plans, we had the order differently, from small to big, and we tested that, the sign-ups mostly stayed the same, but the distribution was a little different. Our revenue per customer was a little higher, it got more people paying on the higher-tier plans.
Egor: It also seems to me that you are trying to communicate value through tooltips for your features. For example, explanation for Analytics is not tied to some metrics, number of hits you have got or some other technical metric, it is more about how people would use it. If I were about to sign up for BidSketch, I would see immediate value in being able to track my clients. Was it a separate test or did you just think that this is a sensible to do?
Ruben: Yeah, I did not test that. It just made sense to try to do that in a way we write up our features on the features page, and tour page and anywhere where we are explaining it, trying to make it clear where the value is. Those have changed and it’s been mostly about clarity because, through Qualaroo surveys on that page, I have seen from time to time questions that people have.
Also, in Crazy Egg I saw, ‘Yep, they are using them’, they are hovering over them, they are looking at them, but maybe I am not explaining it clearly enough or it does not make enough sense.
Egor: I can also see that further down you are using social proof and also have an FAQ in order to, in my understanding, close some of the main objections. Was it tested separately or was it just a sensible thing to add?
Ruben: You know, I have not tested the FAQ. FAQ was added based off the questions that we saw people asking. For example, when we asked them through Qualaroo, why didn’t you sign up or what is keeping you from taking on a plan… that’s what we used that area for.
And the social proof was the results that people who were signing up talked about, the ones that people want. So, two of them are based on them talking about time save. Any time we have tested what people want, like close more deals or save more time or make more money, save more time when it comes to proposals always wins. So, that’s why those specific testimonials are there and that said, there is interest in closing more sales, so we have one focused on that.
Strategy will make or break your experimentation program.
With no strategy in place, you risk running the wrong tests, in the wrong order, on the wrong goals.
But get the strategy right, and you’ll have an impactful and scalable experimentation framework.
What’s more, this framework can help you apply testing not just to your website, but across your entire organisation. Testing then becomes the mindset for growth – not an occasional bolt-on to website marketing.
It enables you to test and optimise messaging, design, user experience – even your advertising, pricing and product.
There are key habits and indicators that suggest a testing program is more tactical than strategic. The table below compares the tactical vs. strategic approaches. Read through description for each to understand where on the spectrum between these two your current approach is lying.
How to shift your approach from tactical to strategic?
#1. From each test existing in a vacuum to strategic evolution of tests.
With a tactical approach to testing, tests do not inform a well-integrated testing strategy, but exist in isolation. This means that when a test is over, you simply move onto your next planned test. As a result, your testing strategy look like this:
The diagram key:
Tests marked as red = losers
Tests marked as yellow = did not make any difference
Tests marked as green = winners
It’s a random set of tests where some win, some lose, and none of them inform your subsequent steps.
In contrast, when you employ a strategic approach your testing looks like this:
In essence, levers are factors which the data shows may impact user behaviour on the site. For example, the lever “card comparison” was based on research findings which showed that people find it difficult to compare credit cards on finance websites. As a result, they did not apply for any because they couldn’t decide which was best.
Levers inform branches of tests. Some tests win, some lose, but the tests are integrated, i.e. each test can inform subsequent tests based on its result.
For example, if you’re a pharmaceutical retailer and you found that delivery is an important factor when deciding whether to purchase oral contraceptives, then here’s what your first test could look like:
If your test won, then you could iterate that test idea. Was it the free delivery that mattered or was it the speed? Next step – two variations: “Free” vs. “Next Day”. If it was the speed, maybe we should introduce in-store delivery as well as next day delivery, and see if the extra expense is justified by extra demand. Then, we might test making it more prominent. Instead of showing it in the header, we could include as part of the page’s sub-headline.
This is how a strategic approach to testing forces you to amplify impact from your original test and uncover granular insights about your customers.
#2. From having single winning tests to scaling your impact
Once you know the intricate details about what motivates/prevents your customers from taking a desired action – you’re still not done!
The next stages you can (and should) go through are:
Scale your impact, i.e. test the same concept on other pages. In the context of Lloyds Pharmacy this could mean reinforcing the same concept on other product pages (eg. if we tested it only on major brands, we could roll out the same test concept on the smaller brands, too) or we could test the same concept further down the funnel. For example, Lloyds Pharmacy could reinforce the same benefits when the visitor continues their order.
Share your impact, i.e. apply the same concept to other areas of your business. If this concept resonated so well with your audience, let’s test including it in your PPC campaigns, meta description, and email marketing promo offers. If these work, there is sufficient evidence to then test these in your offline marketing, too!
Here’s the essence of it: You find a winning test idea and then you hammer it. To do so, follow this protocol.
If the test wins:
Amplify (same concept, same page)
Scale (same concept, other pages)
Share (from acquisition to product)
To decide which levers are most powerful in changing your customers’ behaviour, you need a broad view of your optimisation program. For one of the clients we worked with we created the following (anonymised) table:
We tracked everything: the number of tests we ran around a certain lever, the win rate, the uplift that every test concept creates. We then segmented it based on different criteria: the step in the conversion funnel where it was executed, the acquisition channel, device type. This gives us a better idea of where we can scale the impact generated by our most successful tests. For example, we can see that “trust” had the highest win rate and a relatively large uplift, but we have not yet run many tests on our PPC traffic. Let’s scale it further!
#3. From lack of priorities to effective allocation of team’s resources
It’s essential for a strategic testing program to maximise the value of its resources. The success of the program will be limited both by the volume of tests the website supports, as well as by internal resources like design and development time.
That’s why it’s essential to prioritise your tests effectively. It’s impossible to run every test we can think of – so we have to be selective and prioritise strategically.
That means planning the test roadmap by considering variables like:
The value of the area being tested (eg the flow, page or element)
The potential impact of the test
The ease (design and build time, as well as sign-off) required to launch the test
Ensuring that we’re learning about user behaviour (eg by testing across a range of levers, rather than focusing heavily on one or two)
Any risks associated with running the test
In short, we want to prioritise high-impact, high-ease tests on high-value areas:
By prioritising tests based on impact and ease, you make sure that you don’t invest your time in complex, low impact tests.
If a test is complex but has a high potential impact, you should (whenever you can) try to prove the concept first. That means simplifying the execution of the test to a point where it becomes feasible to run – the “minimum viable test” – before progressing to more complex (and potentially more impactful) iterations.
Let’s consider an example.
Minimum viable test: Credit card industry
The research we conducted when analysing the credit card industry showed that the fear of not being approved was the #1 reason preventing people from applying for credit cards.
Santander has a good example of a bad landing page. All the eligibility information is hidden under a huge block of text. Even if you find it, it’s generic, and there is no guidance on whether you, given your individual circumstances, would be approved.
To address this objection more effectively, Santander could build an eligibility checker similar to the one Barclays has:
However, it would require substantial time to build.
To understand if it is worth investing resources into this new tool, Santander could create a minimum viable test to first prove the concept. For example, they could add a new section at the top that would look similar to an eligibility checker, but upon clicking would still present the same generic information:
The visitors still would not find out the information specific to their needs, but the important point is that Santander would be able to measure the % of people who click on this button. If they do, it’s worth developing the concept further – if they don’t, their resources can be better deployed elsewhere.
#4. From retesting similar concepts and dismissing good ones to keeping a test log and continually learning
Every successful test should inform the overall testing strategy. But that can be a challenge if people on your team change and the knowledge of what worked might fade away. Without an effective knowledge base of tests, you’re facing two risks that can undermine your testing program:
Repeating previous tests: You might run similar tests again. At best, you may validate the previous result. At worst, you’ll waste resource by repeating a test – and potentially one that had a negative result.
Dismissing new concepts: A bigger risk is saying, “We already tested that”, without being able to show exactly what was tested and what the outcome was. As above, a test’s success is primarily down to the lever and the concept’s implementation. Dismissing the lever because of an unsuccessful earlier test is a huge risk.
To manage those risks more effectively, at minimum you must track:
Creative execution (screenshots)
Areas of focus
Results (raw data)
But ideally you should also include external factors such as seasonality, competitors’ activity and market conditions. External factors can have an impact on your test results. For example, during December many ecommerce sites do not see their tests achieving statistical significance. This is due to the nature of demand. During peak periods, people care less about persuasive copy, layout and design – they just need to make a purchase. As a result, a well-crafted landing page may not perform any better or worse than the original, but once the peak period is over, clear differences start to emerge.
Here’s an example from Chris Goward’s book You Should Test That! None of the variations achieved statistical significance in December, but Variation C became a decent winner in January and conversion rate difference jumped from 12.7% to 30.6%.
When you approach your testing strategically, there are no such questions. You just go to your knowledge base of tests and analyse whether the test result was a result of the lever, the concept implementation, or potentially external factors (eg seasonality, a change in market conditions, or a change in the traffic profile).
This brings us to important point. If you’re a strategist, here’s how you should approach these losers.
If the test loses due to:
Lever (the core theme of the test didn’t affect user behaviour) = abandon
Execution (the implementation – design, copy, functionality – didn’t affect user behaviour) = retest (and reprioritise)
(For a more in-depth discussion on why execution might fail your tests, read this article by Erin Weigel, Senior Designer at Booking.com)
#5. From driving minor website changes to transforming your organisation
Finally, at the heart of strategic testing is an alignment with the goals of your organisation.
That means the KPI for your tests may not be the conversion rate from visitors to customers, but a broader goal like revenue, profit or lifetime value.
For example, if your goal is to increase revenue, you might break it down as:
Revenue = Traffic x Conversion Rate x Lifetime Customer Value (LCV)
It may be the case that simply putting up your prices will increase the LCV significantly, even if it decreases the conversion rate marginally. It can be a risk to test, but it’s often a simple test to run – there’s very little design and development work involved. This is especially true in some SaaS markets where customers are less likely to have an expectation around price, giving greater elasticity.
This is exactly what Jason Cohen, the CEO of WP Engine, recommended to one of the companies at Capital Factory (the largest start-up incubator and accelerator in Austin). According to him, they doubled their prices and the effect on signups was minimal. As a result, the profits almost doubled. There you are – price inelastic demand.
So, should you also double your prices? This is what strategic testing can give you an answer to.
Transforming your organisation means not only growing it, but also challenging its deep-seated assumptions.
For example, in SaaS this might mean re-thinking how you structure your pricing plans. Would customers be convinced to upgrade to higher-tier plans because they see more value in advanced features you offer (and should you thus structure your plans as in the image below)?
Alternatively, you could test giving all features to everyone, regardless of the plan they’re on – then limit the volume of their usage instead. That way, every customer is able to experience the full benefits of the platform, and is more likely to use and engage with it, increasing their usage and subscription level:
(Or could you try and strike a balance between the two, or abandon the whole idea completely and simply charge a flat $99 fee the same way Basecamp does?)
Ultimately you need to maintain a healthy risk profile that’s appropriate for your organisation and its testing maturity.
This means not only iterating your existing test ideas (= safer tests), but also testing completely new concepts and experimenting with radical changes. If you’re not nervous about even just a small percentage of your experiments – then you’re not being radical enough, and you risk not answering important strategic questions about your business and your customers.
Ultimately, in order to transform the organisation the research/data science team needs to align everyone on making data-based decisions. This means no more sitting together as a closed group that simply sends reports to the C-suite once a month, but becoming the core link between the C-suite and the business’s customers. This comes back to the point raised above: the impact in the form of new knowledge needs to be shared with the organisation. Humans are hardwired for stories, not processing long spreadsheets. This is why storytelling backed by data – what we call insight narratives – is the most effective way to keep the data pumping through the veins of your organisation and aligning everyone on the same vision.
Avinash Kaushik put it brilliantly (when he was interviewed at SES conference):
We need to take some of the dryness and connect it to real life when we present data. So, when people ask me what the metric bounce rate is, I very rarely say that it’s the percent of sessions with single pageviews. That does not communicate what they are! What I say is, they represent – from a customer perspective – an experience that is, “I came, I puked, I left”. You are never gonna forget that! You are never gonna forget that definition because of the way it was articulated.
I found that after years of trying to convince people, I’ve tried to get data to connect to real life. When a newspaper company wrote an email campaign and I analysed it later, I basically said, “You had the 13 million one night stands with customers because you created no visitor loyalty”. Again, that was a way to make this data very REAL to them. Everyone knows what a one night stand is, and most of them were not great.
Digital Marketing Evangelist at Google
As you can see, there are clear differences between tactical and strategic optimisation programs.
It’s not to say that individual tactics won’t work – they can and do – but without a broader strategy to unite them, they’ll be limited in reach and impact. Sun Tzu, a Chinese military strategist, knew that the problem was not with the tactics themselves, but with the overall approach:
“Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat.”
With an effective strategy in place, it won’t just provide a framework for testing – it’ll allow you to test deep-seated assumptions in your organisation.
And by doing that, you’ll be giving your organisation a significant competitive advantage. While most companies are stuck testing granular changes to their websites, you’ll be testing changes that can radically shift your ability to acquire, convert and monetise traffic.