“We are our choices.”
So says JP Sartre (and Dumbledore).
The same is true of product.
Everything we produce is the result of our choices. Which products and features do we roll out? Which do we roll back? And which ideas never even make it on the backlog?
The problem is – most of us suck at making choices.
Decisions are made by consensus, based on opinion not evidence. We’re riddled with subjectivity and bias, often masquerading as “experience”, “best practice” or “gut instinct”.
But there’s a better way – using experimentation as a way to define your product roadmap.
Experimentation as a product development framework
For many product organisations, experimentation serves two functions:
1.Safety check: Product and engineering run A/B tests (or feature flags) to measure the impact of new features.
2.Conversion optimisation: Marketing and growth run A/B tests often, for example, on the sign-up flow to optimise acquisition.
But this neglects experimentation’s most important function:
3.Product strategy: Product teams use experimentation to find out which features and ideas their customers will actually use and enjoy.
In doing so, you can use experimentation to inform product – not just validate it. You can test bolder ideas safely, creating better products for your customers. By putting experimentation at the heart of their business, organisations like Facebook, Amazon, Uber and Spotify have created and developed products used by billions worldwide.
But they’re in the minority. They represent the 1% of brands that have adopted experimentation as not just a safety check, but as a driving force for their product.
So how do the 99% of us better adopt experimentation?
Five principles of product experimentation
#1 Experiment to solve your biggest problems.
First, and most importantly, you should experiment on your biggest problems – not your smallest.
If experimentation is only used to “finesse the detail” by A/B testing minor changes, you’re wasting the opportunity.
To start, map out the products or features you’re planning. What are the assumptions you’re making, and what are the risks you’re taking? How can you validate these assumptions with experimentation?
Also, what are the risks you’re not taking – but would love to at least try with an A/B test?
#2 Be bold.
Experimentation lets you experiment with the confidence of a safety net.
Because experiments are – by their nature – measurable and reversible, it gives us a huge opportunity to test ideas that are bolder than we’d ever dare.
In his 1997 letter to investors, Jeff Bezos talked about type 1 and type 2 decisions.
Type 1 decisions are irreversible – “one-way doors”:
“These decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before.”
Type 2 decisions are reversible – “two-way doors”:
“But most decisions aren’t like [Type 1 decisions] – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through.”
Fast forward 22 years and Jeff Bezos’s latest letter to investors doubles down on this approach:
As a company grows, everything needs to scale, including the size of your failed experiments. If the size of your failures isn’t growing, you’re not going to be inventing at a size that can actually move the needle. Amazon will be experimenting at the right scale for a company of our size if we occasionally have multibillion-dollar failures.
If we aren’t prepared to risk failure, then we don’t innovate. Instead, we stagnate and become Blockbuster in the Netflix era.
Instead, experimentation gives us a safety net to take risks. We can test our boldest concepts and ideas, which would otherwise be blocked or watered down by committee. After all, it’s only a test…
#3 Test early / test often.
Experimentation works best when you test early and often.
But for most product teams, they’re testing once, at the end. They do this to measure the impact of a new feature before or just after it launches. (This is the “safety check” concept, mentioned above.)
Their process normally looks like this:
Whether the experiment wins or loses – whether the impact is positive or negative – the feature is typically rolled out anyway.
Why? Because of the emotional and financial investment in it. If you’ve spent 6 or 12 months building something and then find out it doesn’t work, what do you do?
You could revert back and write off the last 6 months’ investment. Or you could persevere and try to fix it as you go.
Most companies choose the second option – they invest time and money in making their product worse.
As Carson Forter, ex-Twitch now Future Research, says of bigger feature releases:
“By the time something this big has been built, the launch is very, very unlikely to be permanently rolled back no matter what the metrics say.”
That’s why we should validate early concepts as well as ready-to-launch products. We start testing as early as possible – before we commit to the full investment – to get data on what works and what doesn’t.
After all, it’s easier to turn off a failed experiment than it is to write off a failed product launch. What’s more, gathering data from experiments will help us guide the direction of the product.
#4 Start small and scale.
To do that – to test early and often – it means you’ll frequently have to start with the “minimum viable experiment” (MVE).
Just like a minimum viable product, we’re looking to test a concept that as simple and as impactful as possible.
Henrik Kniberg’s drawing illustrates this well:
So what does this look like in practice? Often “painted door tests” work well here. You don’t build the full product or feature and test that. After all, by that point, you’ve already committed to the majority of the investment. Instead, you create the illusion of the product or feature.
Suppose a retailer wanted to test a subscription product. They could build the full functionality and promotional material and then find out if it works. Or they could add a subscription option to their product details pages, and see if people select it.
Ideally before they run the experiment, they’d plan what they’d do next based on the uptake. So if fewer than 5% of customers click that option, they may deprioritise it. If 10% choose it, they might add it to the backlog. And if 20% or more go for it, then it may become their #1 priority till it was shipped.
We’ve helped our clients apply this to every aspect of their business. Should a food delivery company have Uber-style surge pricing? Should they allow tipping? What product should they launch next?
#5 Measure what matters.
The measurement of the experiment is obviously crucial. If you can’t measure the behaviour that you’re looking to drive, there’s probably little point in running the experiment.
So it’s essential to define both:
- the primary metric or “overall evaluation criterion” – essentially, the metric that shows whether the experiment wins or loses, and
- any second or “guardrail metrics” – metrics you’re not necessarily trying to affect, but don’t want to perform any worse.
You’d set these with any experiment – whether you’re optimising a user journey or creating a new product.
As far as possible – and as far as sample size/statistical significance allows – focus these metrics on commercial measures that affect business performance. So “engagement” may be acceptable when testing a MVE (like the fake subscription radio button above), but in future iterations you should build out the next step in the flow to ensure that the positive response is maintained throughout the funnel.
Why is this approach better?
1.You build products with the strongest form of evidence – not opinion.
Casey Winters talks about the dichotomy between product visionaries and product leaders. A visionary relies more on opinion and self-belief, while a leader helps everyone to understand the vision, then builds the process and uses data to validate and iterate.
And the validation we get from experiments is stronger than any other form of evidence. Unlike traditional forms of product research – focus groups, customer interviews, etc – experimentation is both faster and more aligned with future customer behaviour.
The pyramid below shows the “hierarchy of evidence” – with the strongest forms of evidence at the top, and the weakest at the bottom.
You can see that randomised controlled trials (experiments or A/B tests) are second only to meta analyses of multiple experiments in terms of quality of evidence and minimal risk of bias:
2.Low investment – financially and emotionally.
When we constantly test and iterate, we limit the financial and emotional fallout. Because we test early, we’ll quickly see if our product or feature resonates with users. If it does, we iterate and expand. If it doesn’t, we can modify the experiment or change direction. Either way, we’re limiting our exposure.
This applies emotionally as well as financially. There’s less attachment to a minimum viable experiment than there is a fully-built product. It’s easier to kill it and move on.
And because we’re reducing the financial investment, it means that…
3.You can test more ideas.
In a standard product development process, you have to choose the products or features to launch, without strong data to rely on. (Instead, you may have market research and focus groups, which are beneficial but don’t always translate to sales).
In doing so, you narrow down your product roadmap unnecessarily – and you gamble everything on the product you launch.
But with experimentation, you can test all those initial ideas (and others that were maybe too risky to be included). Then you can iterate and develop the concept to a point where you’re launching with confidence.
It’s like cheating at product development – we can see what happens before we have to make our choice.
4.Test high risk ideas in a low risk way.
Because of the safety net that experimentation gives us (we can just turn off the test), it means we can make our concepts 10x bolder.
We don’t have to water down our products to reach a consensus with every stakeholder. Instead, we can test radical ideas – and just see what happens.
Like Bill Murray in Groundhog Day, we get to try again and again to see what works and what doesn’t. So we don’t have to play it safe with our ideas – we can test whatever we want.
Don’t forget, if we challenge the status quo – if we test the concepts that others won’t – then we get a competitive advantage. Not by copying our competitors, but by innovating with our products.
And this approach is, of course, hugely empowering for teams…
5. Experiment with autonomy.
Once you’ve set the KPIs for experimentation – ideally the North Star Metric that directs the product – then your team can experiment with autonomy.
There’s less need for continual approval, because the opinion you need is not from your colleagues and seniors within the business, but from your customers.
And this is a hugely liberating concept. Teams are free to experiment to create the best experience for their customers, rather than approval from their line manager.
Experimentation doesn’t just give you data you can’t get anywhere else, it’s almost always faster too.
Suppose Domino’s Pizza want to launch a new pizza. A typical approach to R&D might mean they commission a study in consumer trends and behaviour, then use this to shortlist potential products, then run focus groups and taste tests, then build the supply chain and roll out the new product to their franchisees, and then…
Well, then – 12+ months after starting this process – they see whether customers choose to buy the new pizza. And if they don’t…
But with experimentation, that can all change. Instead of the 12+ month process above, Domino’s can run a “painted door” experiment on the menu. Instead of completing the full product development, then can add potential pizzas to the menu that look just like any other product on the menu. Then, they measure the add-to-basket rate for each.
This experiment-led approach might take just a couple of weeks (and a fraction of the cost) of traditional product development. What’s more, the data gathered is, as above, likely to correlate more closely to future sales.
7. Better for customers.
When people first hear about the painted door testing like this example with Domino’s, they worry about the impact on the customer.
“Isn’t that a bad customer experience – showing them a product they can’t order?”
And that’s fair – it’s obviously not a good experience for the customer. But the potential alternative is that you invest 12 months’ work in building a product nobody wants.
It’s far better to mildly frustrate a small sample of users in an experiment, than it is to launch products that people don’t love.
To find out more about our approach to product experimentation, please get in touch with Conversion.