General Archives |

Press release: TRGT joins the Sideshow Group

We’re delighted to announce that TRGT, one of the UK and Europe’s fastest growing and most successful performance marketing agencies is joining the Sideshow Group.

TRGT has enjoyed rapid YOY growth of over 40% in recent years, and works internationally for clients such as Nestle and Unilever brands, Pepe Jeans and Hackett.

Founders Ben and Patrick Nancarrow are based in Barcelona, and the agency has an innovative working model that includes a global remote workforce and advanced ad tech tools. It’s unique approach and investment in the best performance marketing people, has delivered market leading Paid Social and Paid Search campaigns that achieve outstanding results for their clients.

The business has a strong and positive culture with a people-first approach, and has grown to over 65 digital marketing experts in just seven years. They are established Facebook and Google Partners, and are one of the highest spending independent agencies in EMEA.

Tony Hill from Sideshow says: “This is a capability we’ve been wanting to add for some time now, and in TRGT we couldn’t have found a better partner. The success they achieve for clients is phenomenal and it will further support Sideshow’s evidence-led approach and commitment to delivering commercially impactful work. As soon as I met Ben and Patrick the fit between the two companies was clear, and we are extremely excited to be partnering with such an ambitious and respected company.”

This will be the sixth agency to join the Sideshow Group and the first since the group secured investment from Waterland. It brings total headcount for Sideshow Group to over 300 people.

Patrick says: “When we started out, I remember one of our clients referred to us as ‘two brothers working from their bedrooms’, a fact we could in no way refute. We are therefore immensely proud to have grown TRGT to a brilliant, friendly, international team of 65 and to have had the chance to do some industry leading work with so many exciting clients over the years. Sideshow shares our vision for TRGT, not just in terms of how we grow the business but also how we treat our clients and our people. This is a partnership that will enhance TRGT in every way and we have no doubt that the new journey we are starting on with Sideshow is something we will all be immensely proud of in the future.”

Ben says: “From our first conversation with Tony, it has always felt like an ideal partnership. We are not a typical agency and we thought it would be hard to find a company who would match our mentality, culture and growth ambitions so well. We are exceptionally excited to embark on our next stage of growth both as a company and a group, and we see this as the ideal route to delivering better service and above all better results for our clients”

Sideshow was advised in this transaction by Lewis Silkin for Legal DD, and Deloitte for Financial and Tax DD. TRGT was advised by Osborne Clarke.

Why your experimentation programme needs a risk profile

Risk can be a competitive advantage in your experimentation programme.

If you’re taking more calculated risks than your competitors, you’re going to get better results.

But to do that, you need to understand your risk profile.

In this post, we’ll look at how to define risk in your experimentation programme – as well as three techniques to create better high-risk experiments.

Doing nothing is still a risk

Facebook wall decor

“In a world that is changing really quickly, the only strategy that is guaranteed to fail is not taking risks.”

Mark Zuckerberg

Facebook have always been open to taking risks.

When he was 22, Mark Zuckerberg turned down billion-dollar offers for the company. Instead, over the next ten years, he spent billions of dollars himself: acquiring companies like Instagram and WhatsApp, and making other long-forgetting investments and product launches that didn’t pay off.

The priority is to manage risk – and experimentation can be an effective strategy to do so.

Now some stakeholders will think that any experiment – no matter how small – is a huge risk. But for most A/B tests, your risk is limited. There’s the cost of building the test and occasionally a potential drop in performance during the test. But normally that’s it. 

In fact, there’s actually much more risk in making changes to the website without testing them.

Low-risk and high-risk experiments

Your experimentation programme already has a risk profile.

Every experiment you run is low risk, high risk, or somewhere in between. And if you’re not consciously managing that balance, you’re probably not getting the full benefit of experimentation.

Experiments can be low risk. You might have run similar tests in the past, and be pretty confident that this one will work. (Or at least confident it won’t break anything.)

Low risk: If showing “most popular” sleeve lengths was successful (left), then expanding the sizing options out (right) is likely to be low risk.

They can be medium risk. You might be trying out a completely new untested hypothesis. It could work – or you might have wasted time building the test, and lost money running it.

Medium risk: What would happen if we added a donation target to Unicef’s landing page, and showed how close we were to achieving it?

Or the experiments can be high risk. This is when you test disruptive ideas. Experiments like this are high risk because you’re risking the cost of building it and the potential loss of money while the experiment is live.

But it goes further – there’s also a risk outside of the experiment. It might affect the audience in the experiment long after you stopped it – or it might have implications on the brand as a whole.

High risk: What would happen if we changed Wistia’s SaaS pricing model from feature-based to volume-based?

Analysing low- and high-risk experiments

To summarise:

Low-risk experiments are typically iterative – you’re building on an already-proven concept. Their role is to exploit: you’ve validated a lever and are now looking to maximise its impact across the customer journey. The only potential loss is the cost of building the experiment (just because it worked once, doesn’t mean it’ll work again).

Medium-risk experiments are typically innovative – you’re testing out new concepts (but not necessarily radical ones). Their role is to explore: you want to understand what drives customer behaviour, and an experiment will inform that understanding. As before, the potential loss is the cost of building the experiment – but you may also lose money running the experiment, if it lowers performance.

High-risk experiments are disruptive – not only are you testing out something new, there’s a chance that it could fail miserably. These are the concepts that your competitors are probably too nervous to test – but they could deliver you a significant competitive advantage if they work.

Their role is to expand – to widen your approach by testing radically different ideas. But the risk is greater too. There’s potential for non-controlled impact – essentially, where the damage doesn’t stop when the A/B test stops.

Take the screenshot above from Wistia’s pricing page. Testing a new pricing structure is a high-risk experiment: it could significantly increase revenue, or it could lower it. And potentially it could affect customers who aren’t in the experiment, it could be reported on social media or wider, and so on.

But often these high-risk experiments come with the highest reward. These are the ones that help you move beyond the local maxima.

High-risk experiments will help you jump from the local maxima to the global maxima.

Working out your experimentation risk profile

Look at the experiments you’ve run in the last 6 or 12 months, as well as your backlog of upcoming experiments. Then rate each as low, medium or high risk.

Of course, the definition of risk in your organisation will be different to mine. So come up with a simple format that works for you.

If you like, you can try a series of questions like this:

  • What type of change are you making? eg UI, functionality, pricing, product.
  • Have you tested a similar hypothesis before?
  • If you have, was it successful?
  • What’s the cost needed to build the experiment?
  • What percentage of online revenue does the experiment affect?
  • Might it change the behaviour of users in the experiment even after it’s stopped?
  • Might it change the behaviour of users not in the experiment?

We’ve put this in a simple spreadsheet. You can answer all the questions and get a risk score straight away. Of course, you’ll want to adapt the questions and variables and scoring before you start. This is just an example:

Download and adapt this Google Sheet

Or if you want an even simpler alternative, just ask yourself this question about each experiment:

“If I couldn’t run an A/B test, would I still make this change?”

If you’d still make the change, it’s almost certainly low or medium risk. If you wouldn’t, it’s probably high risk.

The importance of high-risk experiments

If we only test changes we’d make anyway, we’re wasting the opportunity of experimentation.

This is one of the most common mistakes people make in experimentation. They only run tests on changes that they’d make anyway.

It starts with an idea:

“This seems like a good idea. Let’s test it and see just how right I am.” 

Now there’s a good reason to test these changes. You might be wrong. Or some audience segments may respond differently. And if it is successful, it’s good to know the size of the impact – not just whether it’s positive or negative. This insight will help you come up with new hypotheses and prioritise your roadmap. 

But it’s just as important to test changes that make you nervous. Disruptive experiments allow you to make bigger bets.

“This experiment might crash and burn, but if it works…”

If you’re only testing best practice or patterns you see on competitor websites, you’re not going to be getting a competitive advantage. You’re going to be limiting yourself to the local maxima. 

Experimentation allows us to test anything we want – and to limit the fallout. It derisks innovation.

Creating your risk strategy

If you used the Google Sheet above, it’ll show you what your risk profile looks like visually:

In this example, you’ll see that most experiments are blue (medium risk), with an equal balance of low- and high-risk experiments.

There’s no perfect answer for what your risk profile should be. Ideally you’d have a balance of all three – and it should change over time.

So right now, your risk profile might look like this:

You’ve got an even balance of innovative and iterative experiments, with occasional radical experiments included to allow for greater leaps forward.

But if you’re in peak season, it might look like this:

You increase the iterative experiments to reduce the risk. Because iterative experiments have a higher win rate, you’re going to have a safer programme during peak season. That means you increase revenue without risking revenue at peak. And you might hold back on radical experiments altogether.

But if you’re just starting your experimentation programme then it might look like this:

You have an even balance across all three. You don’t invest too heavily in iterations, since you haven’t tested too much yet. And you balance innovative and radical ideas to get quick feedback as you develop your product and marketing strategy. (Of course, having this many disruptive experiments is dependent on having the right culture.)

How COVID-19 changes your risk profile

Right now is the best time to be thinking about your risk profile. COVID-19 has changed everything.

Some companies – food delivery, e-learning, home retail are seeing a surge in demand. They should adopt the peak risk profile above, unless they’re still relatively new to experimentation.

But other companies are seeing demand drop off a cliff. That means they could be more aggressive:

With demand dropping, doing nothing is the biggest risk of all.

Instead of doing nothing, or just iterating on the experiments that you’ve run previously, now’s the perfect time to try out the ideas that you were too nervous to do before. 

Coming up with disruptive ideas

By this stage, it’s probably clear that we need to be running more disruptive experiments. This will give us the most meaningful advantage over the competition. 

But how do we come up with the ideas? 

It can be challenging to think creatively. We have a tendency to think incrementally. To look at what we have already, and see how we can make it a little better. It’s hard to throw it out, start over and come up with something that may be better. 

So here are three exercises to try:

Invalidating lore

This is a good place to start. 

“Lore” means the anecdotal knowledge or opinions within a company, which have never been tested. The things you do because “that’s the way they’ve always been done”.

Chamath Palihapitiya started Facebook’s growth team. You might have seen him on the news recently saying that the US shouldn’t bail out hedge funds and billionaires during COVID-19.

He attributed the growth of Facebook to their constant focus on “invalidating lore”:

“One of the most important things that we did was just invalidate all of the lore… All we did was disprove all of the random anecdotal nonsense that filtered around the company.”

Chamath Palihapitiya

Remember, this was a company that was only a few years old. But that’s one of the reasons Facebook has been so successful – everything can be challenged, and data > opinion. 

That is perfect for us – we deal in data, so we should be able to challenge assumptions and show evidence.

Domino’s Pizza – never afraid to innovate

Take Domino’s Pizza. They’ve been making pizzas for 60 years – there’s a lot of knowledge and experience in the business, but that brings with it plenty of “lore”: a lot of assumptions and “that’s the way it’s always been done”. 

But luckily Domino’s is an innovative organisation.

So when companies like Deliveroo and Just Eat and Uber Eats started growing, Domino’s had to respond. These new competitors had deep pockets, and were growing aggressively. What’s more, they were charging customers for delivery – while with Domino’s, customers took free delivery for granted. 

This put Domino’s in an awkward position. They could carry on as before – offering free delivery, because that’s what they’d always done and that’s what customers expected.

Or they could test it.

So they chose one geographic market to test it in. Then they ran an A/B test that added surge pricing to visitors in the treatment. That way, they could see the effect of making this change. Would customers abandon their order? Would they be less likely to come back in the future? Or has it become accepted that you pay for food delivery now?

What’s important here isn’t the result of the experiment – it’s that you seek out the lore in your business, and test it. Forget what you’ve done before or what you thought you knew, and focus on what it could be. 

Divergent thinking

Next is divergent thinking. This is where you answer a question again and again and again… until you start coming up with weird and wonderful answers. Then you go back, and build on these ideas. 

brown concrete bricks
Name as many uses for a brick as you can.

You’ve probably heard of interview questions like, “Name as many uses for a brick as you can.” 

So you might start by talking about building a house or a wall. Then you might think about its attributes. Bricks are heavy, so you could use one as a paper weight or door-stop. Or you could use it to break a window, or take the wheels from a car, and so on. 

This is the approach that Airbnb founder Brian Chesky used to design their “11-star framework”.

Chesky and the other founders wanted to create the perfect experience. So they started to brainstorm the equivalent of a hotel’s 5-star experience, and extrapolated from there. 

Do listen to Chesky talking about this himself – it’s hard to do it justice second-hand. [Masters of Scale 10:36–13:23]

“You almost have to design the extreme to come backwards.”

Brian Chesky

Like Chesky says, you design the extreme and then come backwards. That way you create something that’s significantly further ahead of where you’re at today. 

You don’t take one step forward from where you are today. You go to where the ideal becomes impossible and then take one step back.

2x not 2%

We need to shift our mindset from incremental growth to exponential growth. It sounds obvious, but the following is a mistake I make all the time… 

You’re building an experiment backlog, and you start by looking at what’s already there, what it looks like, how it works…

But often, you anchor your ideas to what’s there already, making small changes rather than throwing things out and starting again. 

And that’s totally fine most of the time – especially if you’re deliberately working on low-risk experiments. 

But if you want to come up with radical ideas, you need to think differently. So ask yourself:

What would increase the conversion rate 2x (not 2%)?

Let’s take an example…

You’re optimising a SaaS website and you want to increase sign-ups. You could… 

  • Improve the homepage based on user feedback (iterative)
  • Optimise the form based on best practice (iterative)
  • Emphasise Google or Facebook sign up options (innovative)
  • Change from a single-page form to a chatbot style Q&A approach (innovative)

Or could you remove the registration step altogether? 

This is what Posterous did, ten years ago. They wanted to get more people to host content on their blogging platform.

But rather than forcing people to sign up, they just let people upload the content by sending an email. Then they’d create an account automatically:

This is one of the hardest things to do in experimentation – to look at a problem differently and not just test changes you’d make anyway.

Good luck!

Press Release: Conversion joins the Sideshow Group

Sideshow Group signals ambitious growth plans with acquisition of

Independent marketing services group Sideshow has today acquired conversion rate optimisation specialist,, founded by CEO Stephen Pavlovich. Sideshow is a digital communications group that has been quietly building scale in recent years through strong organic growth and by acquiring progressive and innovative marketing businesses without needing external investment. It entered the Sunday Times Tech Track 100 in September 2019.

The purchase of Conversion increases headcount to 250 and turnover to around £30M, positioning Sideshow alongside high growth marketing services groups including S4, Stagwell and You & Mr Jones that have technology at the heart of their offer and are transforming the industry.

Sideshow’s considered and client-centric approach to M&A means they offer a range of truly joined up specialist services covering digital marketing and transformation, research, data, and customer experience for brands including BT, Experian, KFC, HSBC, Tesco and EDF Energy. Companies in the group include Bunnyfoot, Vertical Leap, Thinking Juice and Strawberry Soup.

Conversion has a client list that includes Canon, Domino’s and Facebook. It positions itself as an ‘experimentation’ agency that “improves customer experiences with data-driven experimentation” and provides a range of conversion rate optimisation services covering personalisation and product and pricing experimentation.

Tony Hill, founder of Sideshow comments: “Covid has accelerated the need for genuine client/agency partnerships that can support clients strategically and commercially. That’s always been our approach and Conversion is a best in class optimisation business with effectiveness at its core. We are building a group that will serve client needs now and into the future and Conversion sits in a real sweet spot as even more commercial transactions move online.

He adds: “The agency agenda has changed and a leading role has emerged that connects creativity with data, technology and conversion services that increasingly underpin the commercial decisions that clients make.”

Stephen Pavlovich, CEO at comments: “Culture has been one of the things that’s been really key to us as we’ve grown the business.  In Sideshow we’ve found a partner with very similar values. We are joining a vibrant group of entrepreneurial companies who are completely independent, with a strong growth agenda and that is a very exciting proposition.”

Waypoint Partners, a leading global growth and corporate finance advisory firm to the marketing services sector advised Conversion on their sale to Sideshow.

Matt Lacey, managing director at Waypoint explains: “The partnership of Conversion and Sideshow is a real meeting of minds, an example of two independents with shared values and a desire to be changemakers by bridging the worlds of transformation and marketing. The networks were already struggling and this difficult new world we find ourselves in is exposing their weaknesses even further. There is a significant opportunity for ambitious groups to make their mark and drive disruption; I believe the combination of Conversion and Sideshow is perfectly placed to do just that.”

Tony Hill concludes: “While we’ve been quietly validating our model and approach in recent years, we are now in a position to really focus on growth. We are ambitious and looking to make further acquisitions this year. Data and analytics, digital transformation, Amazon services and media are all in our sight, but ultimately it all comes down to the cultural fit. Only by having shared values and trust can you collaborate and share effectively.”

Creating a culture of experimentation that celebrates failure and drives performance: An evening with Conversion & Kameleoon

Brands increasingly understand the importance of experimentation to constantly improve and optimise their digital performance.

However, creating this culture of experimentation, where failure is seen as something to learn from can be difficult. How brands can build this culture and drive improved performance, was the focus of our recent joint event with Kameleoon at Sea Containers House.

Exploring the topic with a packed room were James Gray of Facebook, Marianne Stjernvall of TUI and Conversion’s Stephen Pavlovich. In a wide ranging discussion that moved from the 18th to the 21st centuries, and from finding the cure to scurvy to the Amazon Echo, the speakers explained hints, tips and case studies for building a strong experimentation culture.

Experiments that lose, build a better business case than the ones that win!

Starting the evening, Stephen Pavlovich explained his best practices for experimentation, starting with the importance of recognising and celebrating failure, rather than simply sweeping it under the carpet.

Too many organisations only talk about winning experiments, rather than learning from the losers. This leads to poor practice as losing experiments can often outweigh the value of the winning ones. Take Amazon’s disastrous Fire smartphone, which failed to sell and cost the company $170 million. Much of the technology was repackaged into the Amazon Echo smart speaker, creating an incredibly successful product and channel for the company.

Next Stephen addressed the issue of getting buy-in for experimentation, either by scaling experiments up to address bigger issues (push), or scaling something big down to make it more manageable (pull). This helps spread experimentation across the business and directly supports a culture where more radical ideas are tested. As he concluded, “We shouldn’t just test what we were going to do anyway. We should experiment on our boldest ideas – with a safety net.”

Getting it wrong…

Facebook is often highlighted as a leader in experimentation. However, as James Gray, Growth Marketing Manager in Facebook’s Community Integrity division explained, you still need to follow a structured and sensible process in order to drive optimal results. He highlighted three areas to focus on:

  1. Focus on the basics: Make sure that experiments are run for long enough to deliver realistic results (two business cycles is a logical timeframe), that you are choosing (and sharing) the right metrics to measure the experiment BEFORE it starts and that you have a process to widely share the results.
  2. Adapt to your terrain: Every business is different, so make sure your experiments reflect the idiosyncrasies of your organisation. Cater for factors that might either impact your results or mean that results don’t always translate into the promised benefits down the line.
  3. Navigate your organisation: To get your results accepted and acted upon you need to bring the rest of the organisation with you. That means winning people over by taking the time to explain what you are doing and why your results are important. James shared the case of 18th century doctor John Lind who tested different potential treatments for scurvy amongst sailors. While his experiment successfully identified citrus fruit as the cure, the naval authorities took 42 years to put this finding into practice – primarily because it was not clearly communicated in ways that these non-specialists could understand.

Creating an experimentation culture at TUI

Large, traditional businesses often find it hard to change to a culture of experimentation, held back by organisational structures and a more conservative way of operating. Demonstrating that this doesn’t need to be the case, Marianne Stjernvall, CRO Lead at TUI, outlined six ways to get your culture of experimentation off the ground, based on her experience at the international travel group.

  1. Be the hero: Have a clear plan on what you want to do, introduce the idea of testing, and listen to stakeholders to get them onside. Remember that humans are hard-wired to be suspicious of change, so not everyone will be positive from the start.
  2. Run masterclasses: Show people what you are passionate about with regular high-level, fun sessions that are open to all. Keep running them and you’ll draw in more people from across the business that you otherwise would never get to speak to.
  3. Share the results: Hold meetings with key stakeholders to drill down into your results. Be sure they understand the data and what it means before increasing the level of your detail and transparency through tools such as dashboards.
  4. What I Learnt This Week: Weekly team meetings where everyone has the chance to share what they have done, and what has resulted from it. Make it user and people focused.
  5. CRO Showcase: Highlight every learning through regular calls with your team, particularly if you are spread geographically in different countries.
  6. Monthly dashboard: Introduce an element of (friendly) competition by not only reporting on all tests and their impact on business metrics, but highlighting the test that performance best of all. What can be learned and applied elsewhere?

The evening finished with an interactive panel Q&A, covering topics as diverse as “most embarrassing test”, to what to do if your test proves inconclusive.

If you’d like to learn more about how Conversion can help you use experimentation to drive growth in your business, get in touch today. If you’d like to find out more about Kameleoon’s A/B testing and experimentation platform, then visit their website here.

Saving the high street: Using experimentation in-store to improve customer experience

Living in a world on fast forward is no easy thing. 

Dramatic transformation takes place in every aspect of our lives, every industry and every sector. Staying in the game is posing new challenges for businesses across the globe. And in-store retail is particularly vulnerable. 

This blog focuses on explaining the challenges retailers are facing and why experimentation could be the path for success in the new world of customer experience. 

The current state of retail

After the Great Recession in 2008-09, a new era started in the consumer market. A golden age of consumption driven by the dazzling evolution of technology. Anyone with a mobile phone and an internet connection got access to virtually any product or service. 

As a consequence, the previously almost non-existent online channel became absolutely crucial for most brands. E-commerce sales rocketed and expectations are that this trend will continue.  

However, in the offline world, things became less pink. With online taking up more and more market share, alongside fierce competition, polarising economics, high rents or rising taxes, led to more and more retailers cutting their losses and pulling down the shutters for the last time. 

Despite the headlines showing a gloomy perspective, more than 80% of the sales still come from stores. A FirstInsight report shows that consumers spend more time and money when purchasing in-store compared to online.

So then what is going to happen to the brick and mortar?

The transition from sales/sq foot to experience/sq foot

As the uprise of the online will continue, there is only one way the stores will be successful –  brands must embrace the paradigm shift that is taking place. Simply distributing products is no longer enough, as consumers today are not relying on stores as their sole means of access to goods. The reason consumers still go to stores is to engage with the brand, its products, and culture in an emotional way that cannot be replicated online. It’s the experience that drives them in.  

With more and more sales being attributed to mobile, social and online channels, continuing to measure success only by the conventional sales per square foot will indicate diminishing productivity for stores. This, in turn, will lead to more closures and missing out on a huge opportunity: capitalising on the in-store strategic importance to delivering powerful, unique experiences that create engagement and loyalty with a brand. 

So, how do we measure this new goal of optimising the customer experience?

Alongside the traditional ROI metrics, brands need to define their ROX (return on experience) KPIs. The first step is mapping consumers’ purchase journey and isolating the touch points and factors that drive experience. Then, invest in the parts of the company that will move the needle on those interactions and yield measurable results. It’s all about the ‘magic moments’.

In their Global Consumer Insights Survey 2019, PwC offer a list of questions to help build the baseline for ROX KPIs.

But, unfortunately there’s not much that has been done so far to define them. The well-known Net Promoter Score, which as many as two-thirds of Fortune 1000 companies use, is based on a single question concerning whether the consumer would recommend a product, service or company. Many companies hire vendors to track customer experiences with call centres and web properties. 

Coming up with good KPIs to optimise for ROX will probably be one of the milestones for the new retail industry in the upcoming years.

Delivering in-store customer experience through experimentation

So, why is experimentation the answer?

In the online world, user experience has increasingly become the focus of brands over recent years. We now have UX design teams trying to create products that provide meaningful and relevant experiences to users, and experimentation programmes that are letting us know how a change would impact KPIs before implementing it. More and more online businesses rely on experimentation to inform their decisions and strategy and improve CX (customer experience).

However offline, things are quite different. Management usually introduces ‘innovations’ aimed at improving revenue without analysing data and understanding what difficulties customers are facing or where the opportunity lies. Sometimes they get it right, at other times, it results in catastrophic failure. 

One example of things going wrong is that of J.C.Penney. In 2011 Ron Johnson, Apple’s senior VP, became J.C.Penney’s CEO. Without any testing, he immediately introduced a change that decimated Penney’s revenue. He introduced technology, which took over the cashiers’ duties. This made the checkout experience too formal. All forms of clearance sales and discounts stopped. Also, high-priced branded products became prominent in the stores. It took only a year and five months after these changes for J.C.Penney’s sales revenue to crash to an all-time low. Management reacted, relieved Johnson of his position as CEO and went ahead to reverse the changes. 

Blockbuster, the multibillion-dollar video rental giant, is another example. The company faced decreasing popularity in 2000 because of unreasonable ‘late’ fees customers had to pay if they didn’t return their rented movies in time. One of the proposed solutions for the problem  was to run a simple and cheap ($12,000 only) email-as-a-reminder experiment. But this was rejected on the basis that it’s not a ‘grand strategy’. Too simple. Instead, they came up with a transformative, big-budget effort to eliminate fees backed by a huge advertising campaign. The plan proved to be both unpopular and, ultimately, illegal. The estimated legal and business costs of this ‘extended viewing fee’ fiasco exceeded tens of millions of dollars. 

And there are plenty of similar examples. The conclusion – this is not a game of chance. As the era of ‘bricks and clicks’ and omnichannel is upon us, it’s time for the online techniques to be reflected offline and experimentation put to good use.

Surely everyone should be experimenting in-store then?

Testing in-store does pose significantly more challenges than online. And the main reason for that is organisational and technical complexities.

Having multiple store branches in several locations adds to the difficulty and complexity of experimentation. While online tests give access to a large pool of consumers from which companies can do a random sampling of any number they want, this type of sampling is not possible with brick and mortar. Carrying out experiments on thousands of store locations is simply not an option.

This challenge usually causes a company to carry out tests on only a small number of its customers which is not representative of the majority.

To make the most out of the in-store experimentation programmes, strategy, as well as it’s execution becomes crucial. Strategy identifies the goal of the programme, defines how success should be measured and uses data to inform hypotheses. Furthermore, commitment from all stakeholders involved is essential, as well as throughout feasibility and reliability investigations when it comes to experiment design and results analysis.

Where do we stand at the moment?

In-store experimentation, also referred to as business experimentation, is scarcely used nowadays, although it’s potential for driving customer experience, informing business decisions and strategy is massive. There are, however, quite a few examples that can build up a strong case in favour.

One example is Kohl’s – one of the largest department store retail chains in America, with 1,158 locations across the country. Back in 2013, someone suggested a one-hour delay in the opening of stores from Mondays to Saturdays to reduce operational costs. This threw up a strong debate within the company’s management, and they decided the only way to know what was right was to subject the idea to a comprehensive business experiment. After conducting a test across 100 stores, the result showed that a delay in store opening time would not have any serious negative impact on sales.

Another example is Wawa, the convenience store chain in the mid-Atlantic United States. They wanted to introduce a flatbread breakfast item that had done well in spot tests. But the initiative was killed before the launch when a rigorous experiment—complete with test and control groups followed by regression analyses—showed that the new product would likely cannibalise other more profitable items.  

While ROI is everyone’s focus, ROX as a goal is relatively new, and mainly the leading brands are the ones starting to understand it and make it part of their company’s strategy. There are a few factors (here at, we call them levers) that proved driving in-store customers experience as well as ROI: 

  • Technology is the one that stands out. In an era of fast technological advancements, there is a plethora of options for retailers to choose from to take their customer experience to the next level. 

The first example that comes to mind is Amazon and their Amazon GO shops where the world’s most advanced shopping technology turn lines and checkout into history. Computer vision, sensor fusion, and deep learning automatically detect when products are taken from or returned to the shelves and keep track of them in a virtual cart. When customers are done shopping, they can just leave the store. A little later, a receipt is sent, and customers’ Amazon account charged. 

Similarly, Sainsbury’s experimented with improving their customers’ experience by opening a till-free store in London back in April. This way, customers didn’t have to wait in long lines, and could self-checkout simply by scanning the products with their phone, after installing an APP. You can read more about the experiment here. The results of this experiment are still yet to be announced.

  • Employees are a powerful factor to drive ROX. However, research suggests that retailers tend to view store associates as an expense to be controlled rather than as a medium to provide better service for customers.

A randomized controlled experiment  was run in 28 Gap stores in San Francisco Bay Area and Chicago in 2015 by an interdisciplinary team led by Principal Investigator Joan C. Williams from the University of Chicago. For the experiment, retail associates were shifted to more-stable schedules to see how that would impact the sales and work productivity. 

The results were striking. 

Sales in stores with more stable scheduling increased by 7%, an impressive number in an industry in which companies work hard to achieve increases of 1–2%. Labour productivity increased by 5% in an industry where productivity grew by only 2.5% per year between 1987 and 2014. The estimate is that Gap earned $2.9 million as a result of more-stable scheduling during the 35 weeks the experiment was in the field. All details about the experiment can be found here.

Nevertheless, the best to exemplify the impact employees can have on ROX are the iconic Apple stores. They rely on a very effective communication technique adapted from The Ritz-Carlton – Steps of Service. Every employee is trained to walk a customer through five steps that spell out the acronym A-P-P-L-E:

A – Approach customers with a personalized, warm welcome.

P – Probe politely to understand the customer’s needs.

P – Present a solution for the customer to take home today.

L – Listen for and resolve issues or concerns.

E- End with a fond farewell and an invitation to return.

  • In-store design, fixtures, and facilities also play a significant role in customer experience. Proving that they understand how brick and mortar retail is changing in the age of e-commerce, Nike opened a new five-story, 55,000 square foot store in New York City. There is a mini indoor basketball court, a treadmill, a system that simulates runs in different locations, a small soccer enclosure, a shoe bar where shoppers can personalise a pair of Nike Air Force and coaches who put customers through drills to test out different pairs of shoes. It is as much a place to play as it is a place to shop.

To conclude, in a market with customers’ expectations higher and more dynamic than ever, businesses have a powerful instrument in their toolkit to help them understand and meet these expectations – experimentation

Putting experimentation at the heart of a business, not only leads to better and more innovative ways of doing things – but actually gives companies the confidence to overturn wrongheaded conventional wisdom, and the faulty business intuition that even seasoned executives still inhabit.

To find out more about our approach to experimentation, get in touch today!

‘Don’t launch the wrong product’: An evening with and Amplitude

Last week, we partnered up with Amplitude to host a compelling evening focused on product experimentation at the beautiful Century Club in London.

We were delighted to be accompanied by some brilliant speakers. We welcomed on stage Veronica Hulea (Head of Analytics at Zoopla), Rob Beattie (Head of Digital Product at Domino’s Pizza) and our very own, Stephen Pavlovich (CEO and Founder of

Bringing together product practitioners and leaders from a range of different brands and industries, we wanted to share just how businesses should be using experimentation to not only inform product, but to actually define their roadmap.

We kicked off the evening with Stephen Pavlovich introducing the audience to experimentation as an engine to a successful product roadmap. He talked about how our choices are too often defined by ‘position, authority and experience’ or even ‘gut feeling.’ And, when we operate as teams, the products tend to be even worse. Or, to put it in Stephen’s terms…

“Decisions by committee will always be shitty.” 

Instead, Stephen suggested that the most successful companies use experimentation as a product development framework. Using experimentation not just to validate your ideas – but to define them, means you can test bolder ideas safely, creating better products for your customers. This is exactly how the likes of Facebook, Amazon and Uber work – with experimentation at the heart of their businesses.

Jeff Bezos, Founder and CEO of Amazon

Finally, Stephen shared his five principles of product experimentation:

  1. Experiment to solve your biggest problems.
  2. Be bold.
  3. Test early and often.
  4. Start small and scale.
  5. Measure what matters.

To learn more about Stephen’s five principles, read his blog post on Product Experimentation.

Next up on the stage, we welcomed our first guest speaker, Veronica Hulea, Head of Analytics at Zoopla. Veronica holds years of experience in market analytics as well as product optimisation. She shared her insights on how you can evolve your product without killing your conversion rates. 

She began with examples from her own experience with Zoopla when they attempted to re-platform whilst maintaining a stable conversion rate. 

“Use AB testing to ‘bake’ the new design with a small percentage of users, until it’s ready to replace the old one.”

Veronica also explained why AB test uplifts are not reflected in business metrics. She provided an actionable insight on how to unlock potential based on the level of intent of the user – from the browsing and researching, all the way through to final conversion.

Last but definitely not least, we introduced our guests to Rob Beattie, Head of Digital Product at Domino’s Pizza. Rob has been in the company for a year and a half now, and also has numerous years of experience and knowledge heading up digital product and transformation across different businesses.

Rob took us on a journey through the years of growth and innovation at Domino’s Pizza, and showed us how experimentation has been used to inform the successes so far.

He continued by sharing the role of experimentation in the business as being not only a way to sell more products and develop new features online, but to actually define their physical products as well.

Rob provided actionable insights on ‘what makes a good experiment’, and equally as important, ‘how to run an experiment well’. Finally, our audience got to hear what the future holds for Domino’s Pizza, and just how ambitious their roadmap is! 

Following the brilliant lightning talks, we held a panel Q&A where our guests took the opportunity to ask a myriad of questions about experimentation in general and specifically within their businesses.

If you’d like to hear more about how you can use experimentation to inform your product roadmap and drive growth in your business, then get in touch today.

Keep an eye on our events page to make sure you don’t miss out on future events, or sign-up to our mailing list by emailing

People are aware of cognitive biases but do we know what to do about them?

Decision making is part of our everyday lives. We ask ourselves, “Should I have a coffee or a tea? Should I take the bus or the tube today? How should I respond to this email?”

But are we really aware of just how many decisions the average human makes in just one day? Go on have a guess…

On average, we make a staggering 35,000 decisions per day! Taking into account the 8 or so hours we spend asleep, that works out to be over 2,100 decisions per hour. If we thought consciously about each decision, we would be faced with a debilitating challenge that would prevent us from living out our normal lives. Thankfully our brains have developed shortcuts, called heuristics, which allow us to make judgements quickly and efficiently, simplifying the decision making process.

Heuristics are extremely helpful in many situations, but they can result in errors in judgement when processing information – this is referred to as a cognitive bias.

How can cognitive biases impact our decisions?

Cognitive biases can lead us to false conclusions and as a consequence influence our future behaviour.

In order to illustrate this I am going to take you through a famous study conducted by Daniel Kahneman showing the impact of the anchoring bias. In Kahneman’s experiment, a group of judges with over 15 years’ experience each were asked to look at a case in which a woman had been caught shoplifting multiple times.

In between reviewing the case and suggesting a possible sentence, the judges were asked to roll a pair of dice. Unbeknown to the judges this was the “anchor”. The dice were rigged, and would either give a total of 3 or 9.

Astonishingly, the number rolled anchored the judges when making their sentencing recommendations. Those who rolled 3 sentenced the woman to an average of 5 months in prison; those who threw 9 sentenced her to 8 months.

If judges with 15 years’ experience can be influenced so easily by something so arbitrary about something so important – then what hope do the rest of us have?

Another example of biases impacting important decisions can be found in the Brexit campaigns. We can all remember the “£350 million a week” bus, which suggested that instead of sending that money to the EU we could use it to fund the NHS instead.

There were many other examples of false stories published in the British media. These shocking statements are influential because humans have a tendency to think that statements that come readily to mind are more concrete and valid. This is an example of the availability bias

But how is this relevant for experimentation?

With experimentation, we are tasked with changing the behaviours of users to achieve business goals. The user is presented with a situation and stimuli that impact their emotional responses and dictate which cognitive biases affect the user’s decision making.

When we run experiments without taking this into account we are superficially covering up problems and not looking at the root causes. In order to truly change behaviour we must change the thought process of the user. This is where our behavioural bias framework comes into play…

Step 1. Ensure you have established your goal. Without a goal you will not be able to determine the success of your experiments.

Step 2. Identify the target behaviours that need to occur in order to achieve your goal. At this point it is important to analyse the environment you have created for your users. What stimulus is there to engage them? What action does the user need to take to achieve the goal? Is there a loyal customer base that return and carry out the desired actions again and again?

Step 3. Identify how current customers behave. Is there a gap between current behaviours and target behaviours?

Step 4. Now start pairing current negative biases with counteracting biases. At this point research is imperative. Your customers will behave differently depending on their environment, social and individual contexts. Research methods you can use include surveys, moderated and unmoderated user testing, evidence from previous tests as well as scientific research. Both Google Scholar and Deep Dyve are excellent scientific research resources. 

Step 5. Which is the best solution to test? 

There are three important things to consider at this point. 

Value – What is the return for the business?  Volume – How many visitors will you be testing?  Evidence – Have you proven value in this area in previous tests?

Joining the dots.

To bring this framework to life I’m going to run through an example…

Let’s pretend I work for a luxury food brand. I have identified my target goal which is purchases and mapped out how my current users behave on the site. I find that users are exiting the site when they are browsing product pages. Product pages are one of our highest priority areas.

I have conducted a website review which flagged some negative customer reviews. This is not a big issue for us, after all we are reliant on individual taste and we have an abundance of positive reviews. Nevertheless, it seems to be a sticking point for users.

A potential bias at play causing users to exit is the negativity bias. This bias tells us that things of a negative nature have a greater impact than neutral or positive things.

Instead of removing the negative reviews we are going to maintain the brands openness to feedback and leave them onsite. Nevertheless, we still want to reduce exit rate so we are going to test a counteracting bias, the visual depiction effect.

The visual depiction bias states that people are more inclined to want to buy a product when it is shown in a way which helps them to visualise themselves using it. So in our product images we will now add in a fork (this study was actually conducted! Check it out).

The results from the experiment will determine whether our counteracting bias (visual depiction effect) overcame the current one (negativity bias).

So, to conclude…the behavioural bias framework should be used to understand the gap between your customers’ current behaviours and your intended goal. This will allow you to hypothesise potential biases at play and run experiments that bridge the gap between existing and aspirational behaviours.

To find out more about our approach to experimentation, get in touch today!

The Big Debate: What should your primary metric be?

One of the biggest myths in testing is that your primary metric shouldn’t be the purchase or conversion at the end of the user journey.

In fact, one of the biggest names in the game, Optimizely, states:

Your primary metric (and the main metric in your hypothesis) should always be the behaviour closest to the change you are making in the variation you are employing.”


We disagree – and want to show how this approach can actually limit the effectiveness of your experimentation programme.

But first… what is a primary metric?

Your primary metric is the metric you will use to decide whether the experiment is a winner or not.

We also recommend tracking:

  • Secondary metrics – to gain more insight into your users’ behaviour 
  • Guardrail metrics – to ensure your test isn’t causing harm to other important business KPIs.

So what’s the big debate? 

Some argue that your primary metric should be the next action you want the user to take, not final conversion.

Diagram: Next action vs final action

For example, on a travel website selling holidays, the ‘final conversion’ is a holiday booking – this is the ultimate action you want the user to take. However, if you have a test on a landing page, the next action you want the user to take is to click forward into the booking funnel.

The main motive for using the next action as your primary metric is that it will be quicker to reach statistical significance. Moreover, it is less likely to give an inconclusive result. This is because:

  • Inevitably more users will click forward (as opposed to making a final booking) so you’ll have a higher baseline conversion rate, meaning a shorter experiment duration.
  • The test has a direct impact on click forward as it is the next action you are persuading the user to take. Meanwhile there may be multiple steps between the landing page and the final conversion. This means many other things could influence the user’s behaviour, creating a lot of noise.  
  • There could even be a time lag. For example, if a customer is looking for a holiday online, they are unlikely to book in their first session. Instead they may have a think about it and have a couple more sessions on the site before taking the final step and converting. 

Why is the myth wrong?

Because it can lead you to make the wrong decisions.

Example 1: The Trojan horse

Take this B2B landing page below: LinkedIn promotes their ‘Sales Navigator’ product with an appealing free trial. What’s not to like? You get to try out the product for free so it is bound to get a high click through rate.

But wait…when you click forward you get a nasty shock as the site asks you to enter your payment details. You can expect a high drop-off rate at this point in the funnel.

On this landing page LinkedIn doesn’t tell the user about the credit card form waiting two steps away
LinkedIn requires users to enter their payment details to access the free trial, but this was not made clear on the landing page

A good idea would be to test the impact of giving the user forewarning that payment details will be required. This is what Norton Security have under the “Try Now” CTA on their landing page.

Norton Security lets their users know that a credit card is required, so there are no nasty surprises

In an experiment like this, it is likely that you would see a fall in click through (the ‘next action’ from the landing page). However, you might well see an uplift in final conversion – because the user receives clear, honest, upfront communication.

In this LinkedIn Sales Navigator example:

  • If you were to use clicks forward as your primary metric, you would declare the test a loser, despite the fact that it increases conversion.
  • If you were to use free trial sign ups as your primary metric, you would declare the test a winner – a correct interpretation of the results.

Example 2: The irresistible big red button

The ‘big red button’ phenomenon in another scenario that will help to bust this troublesome myth:

When you see a big red button, all you want to do is push it – it’s human nature.

The big red button phenomenon

This concept is often taken advantage of by marketers:

Imagine you have a site selling experience gifts (e.g. ‘fine dining experience for two’ or ‘one day acrobatics course’). You decide to test the increasing prominence of the main CTA on the product page. You do this by increasing the CTA size and removing informational content (or moving it below the fold) to remove distractions. Users might be more inclined to click the CTA and arrive in the checkout funnel. However, this could damage conversion. Users may click forward but then find they are lacking information and are not ready to be in the funnel – so actual experience bookings may fall.

Again, in this scenario using click forward as your primary metric will lead you to the wrong conclusions. Using final conversion as your primary metric aligns with your objective and will lead you to the correct conclusions.

There are plenty more examples like these. And this isn’t a made-up situation or a rare case. We frequently see an inverse relationship between clickthrough and conversion in experimentation.

This is why PPC agencies and teams always report on final conversion, not just click through to the site. It is commonly known that a PPC advert has not done its job simply by getting lots of users to the site. If this was the case you would find your website inundated with unqualified traffic that bounces immediately. No – the PPC team is responsible for getting qualified traffic to your site, which they measure by final conversion rate.

But is it really a big deal?

Some people say, ‘Does it really matter? As long as you are measuring both the ‘next action’ and the final conversion then you can interpret the results depending on the context of the test.’

That’s true to some extent, but the problem is that practitioners often interpret results incorrectly. Time and time again we see tests being declared as winners when they’ve made no impact on the final conversion – or may have even damaged it.

Why would people do this? Well, there is a crude underlying motive for some practitioners. It makes them look more successful at their job – with higher win rates and quicker results.

And there are numerous knock on effects from this choice:

1.Wasting resources

When an individual declares a test as a winner incorrectly, the test will need to get coded into the website. This will be added to the development team’s vast pile of work. A huge waste of valuable resources when the change is not truly improving the user experience and may well be harming it.

2. Reducing learnings

Using next action as your primary metric often leads to incorrect interpretation of results. In turn, this leads to missing out vital information about the test’s true impact in communications. Miscommunication of results means businesses miss out on valuable insights about their users.

Always question your results to increase your understanding of your users. If you are seeing an uplift in the next action, ask yourself, ‘Does this really indicate an improvement for users? What else could it indicate?’ If you are not asking these questions, then you are testing for the sake of it rather than testing to improve and learn.

3. Sacrificing ROI

With misinterpreted results, you may sacrifice the opportunity to iterate and find a better solution that will work. Instead of implementing a fake winner, iterate, find a true winner and implement that!

Moreover, you may cut an experiment short, having seen a significant fall in next step conversion. Whereas if you had let the experiment run for longer, it could have given a significant uplift in final conversion. Declaring a test a loser when it is in fact a winner will of course sacrifice your ROI.

4. Harming stakeholder buy-in

On the surface, using click-through as your primary metric may look great when reporting on your program metrics. It will give your testing velocity and win rate a nice boost.  But it doesn’t take long, once someone looks beneath the surface, to see that all your “winners” are not actually impacting the bottom line. This can damage stakeholder buy-in, as your work is all assumptive rather than factual and data-driven.

But it’s so noisy!

A common complaint we hear from believers of the myth is that there is too much noise we can’t account for. For example, there might be 4 steps in the funnel between the test page and the final conversion. Therefore, there are so many other things that may have influenced the user in the time between step 1 and step 4 that could lead them to drop off.

That’s true. But the world is a noisy place. Does that mean we shouldn’t test at all? Of course not.

For instance, I might search “blue jacket” and Google links me through to an ASOS product page for their latest denim item. Between this page and the final conversion we have 3 steps: basket, sign in, checkout.

Look at all the noise that could sway my decision to purchase along each step of the journey:

As you can see there is a lot of unavoidable noise on the website and a lot of unavoidable noise external to the site. Imagine ASOS were to run a test on the product page and were only measuring the next action (“add to basket” clicks). Their users are still exposed to a lot of website noise and external noise during this first step.

However, one thing is for sure: all users will face this noise, regardless of whether they are in the control or the variant. As the test runs, the sample size will get larger and larger, and the likelihood of seeing a false uplift due to this noise gets smaller and smaller. This is exactly why we ensure we don’t make conclusions before the test has gathered enough data.

The same goes when we use final conversion as our primary metric rather than ‘next action’. Sure, there is more noise, which is one of the reasons why it takes longer to reach statistical significance. But once you reach statistical significance, your results are just as valid, and are more aligned with your ultimate objective.

But where do you draw the line?

Back to our LinkedIn Sales navigator example: as discussed above, the primary metric should be free trial sign ups. But this isn’t actually the ultimate final conversion you want the user to take. The ultimate conversion you want the user to take is to become a full-time subscriber to your product, beyond the free trial.

You should think of it like a relay race.

The objective of the landing page is to generate free trials. →  The objective of the free trial is to generate full time subscriptions. →  The objective of the full time subscription is to maintain the customer (or even upsell other product options):

Each part of the relay race is responsible for getting the customer to the next touch point. The landing page has a lot of power to influence how many users end up starting the free trial. It has less power to influence how successful the free trial is and whether the user will continue beyond the trial.

Nonetheless, we’ve seen experiments whereby the change does have a positive impact beyond the first leg of the relay race, as it were. In one experiment we explained the product more clearly on the landing page. This increased the user’s understanding of it, making them more likely to actually use their free trial (and be successful in doing so). This lead to an uplift in full subscription purchases 30 days later.

For this kind of experiment that could have an ongoing influence, you may wish to keep the experiment running for longer to get a read on this. It is sensible to define a decision policy up-front in this instance. In this example, where the impact on full purchases is likely to be flat or positive, your decision policy might be:

  • If we see a flat result or a fall in free trial sign ups (primary KPI) we will do the following:
    • Stop the test and iterate with a new execution based on our learnings from the test.
  • If we see a significant uplift in free trial sign ups (primary KPI), we will do the following:
    • Serve the test to 95% and keep a 5% hold back to continue measuring the impact on full subscription purchases (secondary KPI).

This way, you will be able to make the right decisions and move on to your next experiments while still learning the full value of your experiment.

For a test where there is a higher risk of a negative impact on full subscription purchases, you may do the following things:

  1. Define the full subscription metric as your guardrail metric.
  2. Design a stricter decision policy whereby you gather enough data to confirm there is no negative impact on full subscription purchases.

But what if you are struggling to reach significance?

For many, using the next action as the primary metric allows them to experiment faster. So does low traffic justify testing to the next action instead of sale? Sometimes, but only if you’ve considered these options first:

1.Don’t run experiments

That’s not to say you shouldn’t be improving your website too. Experiments are the truest form of evidence to understand your audience. But if you don’t have enough traffic, the next best thing to inform & validate your optimisation is using other forms of evidence instead. You can use methods such as usability testing. Gathering insights via analytics data & user research is extremely powerful. This is something we continually do alongside experimentation, for all our clients.

2. Be more patient

For a particularly risky change, you might be willing to be patient and choose to run an experiment that will take longer to reach significance. Before you do this, ensure you plug in the numbers to a test duration calculator so that you have a good idea of exactly how patient you are going to need to be. Here’s a couple of good ones that are independent of any particular testing tool:

3. Run tests on higher traffic areas & audiences

If you are trying to run tests to a very specific audience or a low traffic page, you aren’t going to have much luck in reaching statistical significance. Make sure you look at your site analytics data and prioritise your audiences and areas by their relative size.

With all being said, you do have a 4th option..

If you are really struggling to reach statistical significance then you might want to use the next action as your primary metric. This isn’t always a disaster – so long as you interpret your results correctly. The problem is that so often people don’t.

For a site with small traffic it may make sense to take this approach if you are experienced in interpreting experiment results.

However, for sites with lots of traffic, there’s really no excuse. So start making the switch today. Your win rates might fall slightly, but when you get a win, you can feel confident that you are making a true difference to the bottom line.

To find out more about our approach to experimentation, get in touch today!

How to Build Meaningful User Segments

Understanding your customers is critical to a successful optimisation strategy. Knowing what motivates some users to purchase, and what prevents others from checking out, is a fundamental requirement to strategic experimentation.

But not all customers are the same, some are impulsive and others are considerate!

This blog sets out to help you find the different audiences that browse your website so you can optimise for them accordingly.

What is a user segment?

A user segment is a distinct set of users that act differently when compared to other users. “Act differently” is important, as there is no point identifying audiences for your website but not being able to act on this information as they all perform the same. Segmenting your users into two groups and targeting them with different experiments only to find out they exhibit the same behaviour all of the time, needlessly increases the length of time it takes to run an experiment.

You also need to ensure that you can identify these user segments online for them to be useful for experimentation. Common personas include data points around income, personality or lifestyle which are useful for adding content and understanding the motivations of a user but I can’t target a personality trait like introvert, or segment my experiment results by this.

Finally, sample sizing is important for experimentation, and my user segments need to be large enough to support analysis to a confident level. You might spot a really interesting trend for users in Bristol that use the first version of Internet Explorer, but if that is only 0.1% of your traffic and will require 500 days to reach a sample size that increases revenue by £50, is it worth your time?

Why do they matter?

By identifying the different types of users that browse your website you understand the different motivations behind conversions and behaviours that these users exhibit. This can then help you enhance the user experience and remove the barriers to conversion for each audience. 

For example, if you know that you have a large segment of browsing users that cycle between listing and product pages over and over again without ever purchasing, you might come to the conclusion that there is key information missing on the listing page. This is then an actionable insight that you can use to gather more information through experimentation.

Experimentation will help you to understand the key information your visitors require, in order to commit to a transaction. These learnings impact not only sales, but can increase the efficiency of your marketing efforts as you know which information to include and bring to your customers attention.

Additional insight can be unlocked from existing experiments when breaking down results by previously identified segments too. At its simplest level, splitting results by device can give insights into how user journeys differ from mobile to desktop and give you data on how to improve device specific experiences. When results differ consistently this can also be a clear indication that your on-site experimentation strategy is ready for personalisation.

Inversely, when you’re not seeing differing results through your user segments this can be a sign that there are still gains to be made from traditional A/B testing to a large audience. A common error for marketers to make is to ‘over-personalise’ customer experiences without any data to show that their user base is ready for a customised experience. This usually results in a higher frequency of inconclusive experiments, and the winning tests having a smaller revenue impact than they would if the benefit was served to the entire audience.

How do I find them?

1.What would you do?

The starting point for building user segments should be to think about your own personal experiences when browsing your website and that of your competitors. It’s likely that those experiences aren’t unique to you and are common amongst your user base. At the first level, think about when you visit the website, what devices you use and what your state of mind tends to be.

It’s difficult to template this approach as your user segments should be unique to your website or industry. Every website has new and returning users, but the characteristics of these user types can differ wildly across websites within the same industry. A new user to google maps can have wildly different intentions to a new user visiting citymapper despite there being an argument that the core products are very similar.

2.Do they exist in the data?

The next step is within analytics, where we can check for our 3 audience criteria: identifiable, impactful and showing distinct behaviour. Anything that is identifiable in analytics should be identifiable on the website (unless you are merging 3rd party data after user sessions – but this should be an edge use case).

The most common way to find user segments within analytics is to see how user journeys differ by different visitor properties – i.e. is there a significant difference between how users transition through the website when they come via branded search terms compared to unbranded? How does their journey differ depending on where they are in the customer lifecycle?

It’s important not to overcomplicate your segments to make them look groundbreaking – most aren’t! It may be as simple as new visitors and those on mobile having similar characteristics compared to returning desktop users. Providing you’ve found distinct behaviour and the audiences are large enough to have a real impact on your KPIs, you can begin to align your strategy towards their needs.

3.Give them some life.                                                                                         

Once you’ve found your segments, I find it useful to name them and give them a relevant story. This can help tailor your thinking towards what the customer needs are and align your strategy to their goal. The best experimentation programmes tend to be customer-centric – so your user segments should be too.

Those new visitors and mobile users seen in the graphs above may have a large proportion of traffic but low conversion rates – and looking at the customer lifecycle shows that almost ¾ of revenue is generated after the first session anyway. It’s reasonable to assume that these visitors are researching at this stage, whilst returning users on desktop are much more likely to convert. Labelling these two segments as “researchers” and “buyers” can stop you wasting time trying to make new users convert when they aren’t likely to; instead you can find out what information is important to these and enhance their user experience so they are more likely to return and convert at a later stage.

There you have it! A couple of actionable user segments that bring to life the different ways visitors browse your website.

From this you can stop bombarding researchers with intimidating urgency tactics that frustrate this type of user and instead look to provide them with the core information they need. When they come back, and they will if they’ve had a positive first impression, you know they’re significantly more likely to purchase. That is when conversion tactics can help give the user the nudge they need to get the conversion over the line and turn what may have been another abandoned basket into a loyal customer.

If you’d like to find out more about how you begin to build meaningful user segments for your business, get in touch today!

It shouldn’t take a year to launch the wrong product: How to make better products with experimentation


We are our choices.

So says JP Sartre (and Dumbledore).

The same is true of product.

Everything we produce is the result of our choices. Which products and features do we roll out? Which do we roll back? And which ideas never even make it on the backlog?

The problem is – most of us suck at making choices.

Decisions are made by consensus, based on opinion not evidence. We’re riddled with subjectivity and bias, often masquerading as “experience”, “best practice” or “gut instinct”.

But there’s a better way – using experimentation as a way to define your product roadmap.

Experimentation as a product development framework

For many product organisations, experimentation serves two functions:

1.Safety check: Product and engineering run A/B tests (or feature flags) to measure the impact of new features.

2.Conversion optimisation: Marketing and growth run A/B tests often, for example, on the sign-up flow to optimise acquisition.

But this neglects experimentation’s most important function:

3.Product strategy: Product teams use experimentation to find out which features and ideas their customers will actually use and enjoy. 

In doing so, you can use experimentation to inform product – not just validate it. You can test bolder ideas safely, creating better products for your customers. By putting experimentation at the heart of their business, organisations like Facebook, Amazon, Uber and Spotify have created and developed products used by billions worldwide.

But they’re in the minority. They represent the 1% of brands that have adopted experimentation as not just a safety check, but as a driving force for their product.

So how do the 99% of us better adopt experimentation?

Five principles of product experimentation

#1 Experiment to solve your biggest problems.

First, and most importantly, you should experiment on your biggest problems – not your smallest.

If experimentation is only used to “finesse the detail” by A/B testing minor changes, you’re wasting the opportunity.

To start, map out the products or features you’re planning. What are the assumptions you’re making, and what are the risks you’re taking? How can you validate these assumptions with experimentation? 

Also, what are the risks you’re not taking – but would love to at least try with an A/B test?

At Domino’s, we’re evangelising the role of experimentation for both
customer experience optimisation and product development.

#2 Be bold.

Experimentation lets you experiment with the confidence of a safety net. 

Because experiments are – by their nature – measurable and reversible, it gives us a huge opportunity to test ideas that are bolder than we’d ever dare.

In his 1997 letter to investors, Jeff Bezos talked about type 1 and type 2 decisions.

Type 1 decisions are irreversible – “one-way doors”:

“These decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before.”

Type 2 decisions are reversible – “two-way doors”:

“But most decisions aren’t like [Type 1 decisions] – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through.”

Fast forward 22 years and Jeff Bezos’s latest letter to investors doubles down on this approach:

As a company grows, everything needs to scale, including the size of your failed experiments. If the size of your failures isn’t growing, you’re not going to be inventing at a size that can actually move the needle. Amazon will be experimenting at the right scale for a company of our size if we occasionally have multibillion-dollar failures.

If we aren’t prepared to risk failure, then we don’t innovate. Instead, we stagnate and become Blockbuster in the Netflix era.

Instead, experimentation gives us a safety net to take risks. We can test our boldest concepts and ideas, which would otherwise be blocked or watered down by committee. After all, it’s only a test… 

#3 Test early / test often.

Experimentation works best when you test early and often.

But for most product teams, they’re testing once, at the end. They do this to measure the impact of a new feature before or just after it launches. (This is the “safety check” concept, mentioned above.) 

Their process normally looks like this: 

Whether the experiment wins or loses – whether the impact is positive or negative – the feature is typically rolled out anyway.

Why? Because of the emotional and financial investment in it. If you’ve spent 6 or 12 months building something and then find out it doesn’t work, what do you do? 

You could revert back and write off the last 6 months’ investment. Or you could persevere and try to fix it as you go. 

Most companies choose the second option – they invest time and money in making their product worse. 

As Carson Forter, ex-Twitch now Future Research, says of bigger feature releases:

“By the time something this big has been built, the launch is very, very unlikely to be permanently rolled back no matter what the metrics say.” 

That’s why we should validate early concepts as well as ready-to-launch products. We start testing as early as possible – before we commit to the full investment – to get data on what works and what doesn’t.

After all, it’s easier to turn off a failed experiment than it is to write off a failed product launch. What’s more, gathering data from experiments will help us guide the direction of the product.

#4 Start small and scale.

To do that – to test early and often – it means you’ll frequently have to start with the “minimum viable experiment” (MVE). 

Just like a minimum viable product, we’re looking to test a concept that as simple and as impactful as possible.

Henrik Kniberg’s drawing illustrates this well:

So what does this look like in practice? Often “painted door tests” work well here. You don’t build the full product or feature and test that. After all, by that point, you’ve already committed to the majority of the investment. Instead, you create the illusion of the product or feature.

season 7 GIF

Suppose a retailer wanted to test a subscription product. They could build the full functionality and promotional material and then find out if it works. Or they could add a subscription option to their product details pages, and see if people select it.

A retailer could add a “Subscribe and Save” option similar to Amazon’s product page –
even without building the underlying functionality. Then, the track the percentage of customers who try to select this option.

Ideally before they run the experiment, they’d plan what they’d do next based on the uptake. So if fewer than 5% of customers click that option, they may deprioritise it. If 10% choose it, they might add it to the backlog. And if 20% or more go for it, then it may become their #1 priority till it was shipped.

We’ve helped our clients apply this to every aspect of their business. Should a food delivery company have Uber-style surge pricing? Should they allow tipping? What product should they launch next?

#5 Measure what matters.

The measurement of the experiment is obviously crucial. If you can’t measure the behaviour that you’re looking to drive, there’s probably little point in running the experiment. 

So it’s essential to define both:

  • the primary metric or “overall evaluation criterion” – essentially, the metric that shows whether the experiment wins or loses, and 
  • any second or “guardrail metrics” – metrics you’re not necessarily trying to affect, but don’t want to perform any worse. 

You’d set these with any experiment – whether you’re optimising a user journey or creating a new product. 

As far as possible – and as far as sample size/statistical significance allows – focus these metrics on commercial measures that affect business performance. So “engagement” may be acceptable when testing a MVE (like the fake subscription radio button above), but in future iterations you should build out the next step in the flow to ensure that the positive response is maintained throughout the funnel.

Why is this approach better?

1.You build products with the strongest form of evidence – not opinion.
Casey Winters talks about the dichotomy between product visionaries and product leaders. A visionary relies more on opinion and self-belief, while a leader helps everyone to understand the vision, then builds the process and uses data to validate and iterate.

And the validation we get from experiments is stronger than any other form of evidence. Unlike traditional forms of product research – focus groups, customer interviews, etc – experimentation is both faster and more aligned with future customer behaviour.

The pyramid below shows the “hierarchy of evidence” – with the strongest forms of evidence at the top, and the weakest at the bottom.

You can see that randomised controlled trials (experiments or A/B tests) are second only to meta analyses of multiple experiments in terms of quality of evidence and minimal risk of bias:

2.Low investment – financially and emotionally.
When we constantly test and iterate, we limit the financial and emotional fallout. Because we test early, we’ll quickly see if our product or feature resonates with users. If it does, we iterate and expand. If it doesn’t, we can modify the experiment or change direction. Either way, we’re limiting our exposure.

This applies emotionally as well as financially. There’s less attachment to a minimum viable experiment than there is a fully-built product. It’s easier to kill it and move on.

And because we’re reducing the financial investment, it means that…

3.You can test more ideas.
In a standard product development process, you have to choose the products or features to launch, without strong data to rely on. (Instead, you may have market research and focus groups, which are beneficial but don’t always translate to sales). 

In doing so, you narrow down your product roadmap unnecessarily – and you gamble everything on the product you launch.

But with experimentation, you can test all those initial ideas (and others that were maybe too risky to be included). Then you can iterate and develop the concept to a point where you’re launching with confidence.

It’s like cheating at product development – we can see what happens before we have to make our choice.

En esta escena de “Annie Hall” (1977) iban a tomar cocaína y el personaje de Woody Allen estornuda… es muy gracioso y lo mejor es que no estaba en el guión, paso durante los ensayos y quedo en la película.
“ Click the pic to watch...

Right now it’s only a notion, but I think I can get money to make it into a concept,
and later turn it into an idea.

4.Test high risk ideas in a low risk way.
Because of the safety net that experimentation gives us (we can just turn off the test), it means we can make our concepts 10x bolder.

We don’t have to water down our products to reach a consensus with every stakeholder. Instead, we can test radical ideas – and just see what happens.

Like Bill Murray in Groundhog Day, we get to try again and again to see what works and what doesn’t. So we don’t have to play it safe with our ideas – we can test whatever we want.

bill murray film GIF

Why, run A/B tests of course.

Don’t forget, if we challenge the status quo – if we test the concepts that others won’t – then we get a competitive advantage. Not by copying our competitors, but by innovating with our products.

And this approach is, of course, hugely empowering for teams…

5. Experiment with autonomy.
Once you’ve set the KPIs for experimentation – ideally the North Star Metric that directs the product – then your team can experiment with autonomy.

There’s less need for continual approval, because the opinion you need is not from your colleagues and seniors within the business, but from your customers.

And this is a hugely liberating concept. Teams are free to experiment to create the best experience for their customers, rather than approval from their line manager.

6. Faster.
Experimentation doesn’t just give you data you can’t get anywhere else, it’s almost always faster too.

Suppose Domino’s Pizza want to launch a new pizza. A typical approach to R&D might mean they commission a study in consumer trends and behaviour, then use this to shortlist potential products, then run focus groups and taste tests, then build the supply chain and roll out the new product to their franchisees, and then…

Well, then – 12+ months after starting this process – they see whether customers choose to buy the new pizza. And if they don’t…

But with experimentation, that can all change. Instead of the 12+ month process above, Domino’s can run a “painted door” experiment on the menu. Instead of completing the full product development, then can add potential pizzas to the menu that look just like any other product on the menu. Then, they measure the add-to-basket rate for each.

This experiment-led approach might take just a couple of weeks (and a fraction of the cost) of traditional product development. What’s more, the data gathered is, as above, likely to correlate more closely to future sales.

7. Better for customers.
When people first hear about the painted door testing like this example with Domino’s, they worry about the impact on the customer.

“Isn’t that a bad customer experience – showing them a product they can’t order?”

And that’s fair – it’s obviously not a good experience for the customer. But the potential alternative is that you invest 12 months’ work in building a product nobody wants.

It’s far better to mildly frustrate a small sample of users in an experiment, than it is to launch products that people don’t love.

To find out more about our approach to product experimentation, please get in touch with Conversion.

Reactive or proactive: The best approach to iteration

Iterating on experiments is often reactive and conducted as an afterthought. A lot of time is spent producing a ‘perfect’ test and if results are unexpected, iterations are run as a last hope to gain value from the time and effort spent on the test. But why subjectively try and execute the perfect experiment in the first instance and postpone the opportunity to uncover learnings along the way by running a minimum viable experiment which is then iterated on?

Experimentation is run at varying levels of maturity (see our Maturity Model for more information on this) however we see businesses time and time again getting stuck in the infant stages due to their focus on individual experiments. We see teams wasting time and resource trying to run one ‘perfect’ experiment when the core concept has not been validated.

In order to validate levers quickly without over investing in resource we should ensure hypotheses are executed in their most simple form – the minimum viable experiment (MVE). From here, success of an MVE gives you the green light to test more complex implementations and failure flags problems with the concept/execution early on.

A few years ago, we learnt the importance of this approach the hard way. Based off the back of one hypothesis for an online real estate business, ‘Adding the ability to see properties on a map will help users find the right property and increase enquiries’, we built a complete map view in Optimizely. A heavy amount of resource was used only to find out within the experiment that the map had no impact on user behaviour. What should we have done? Ran an MVE requiring the minimum resource in order to test the concept. What would this have looked like? Perhaps a fake door test in order to test the demand of the map functionality from users.

This blog aims to give:

  • An understanding of the minimum viable approach to experimentation
  • A view of potential challenges and tips to overcome them
  • A clear overview of the benefits of MVEs

The minimum viable approach

A minimum viable experiment looks for the simplest way to run an experiment that validates the concept. This type of testing isn’t about designing ‘small tests’, it is about doing specific, focused experiments that give you the clearest signal of whether or not the hypothesis is valid. Of course, it helps that MVEs are often small so we can test quickly! It is important to challenge yourself by assessing every component of the test and its likelihood of impacting the way the user responds to an experiment. That way, you will be efficient with your resource and yield the same effect on proving the validity of the concept. Running the minimum viable experiment allows you to validate your hypothesis without over investing in levers that turn out to be ineffective.

If the MVE wins, then iterations can be ran to find the optimal execution – gaining learnings along the way. If the test loses, you can look at the execution more thoroughly and determine whether bad execution impacted the test. If so, re-run the MVE. If not, bin the hypothesis to avoid wasting resource on unfruitful concepts.

All hypotheses can be reduced to an MVE, see below a visual example of an MVE testing stream.

Potential challenges to MVEs and tips to overcome them

Although this approach is the most effective, it is not often fully understood, resulting in pushback from stakeholders. Stakeholders are invested in the website and moreover protective of their product. As a result, the expectation from experimentation is that a perfect execution of a problem will be tested which could be implemented immediately should the test win. However, what is not considered is the huge amount of resource this would require without any validity that the hypothesis was correct or that the style of execution was optimal.

In order to overcome this challenge we focus on working with experimentation, marketing and product teams in order to challenge assumptions around MVEs. This education piece is pivotal for stakeholder buy-in. Over the last 9 months, we have been running experimentation workshops with one of the largest online takeaway businesses in Europe and a huge focus of these sessions has been on the minimum viable experiment.

Overview of the benefits of MVEs

Minimum viable experiments have a multitude of benefits. Here, we aim to summarise a few of these:

Efficient experiments

The minimum viable experiment of a concept allows you to utilise the minimum amount of resource required to see if a concept is worth pursuing further or not.

Validity of the hypothesis is clear

Executing experiments in their most simple form ensures the impact of the changes are evident. As a result, concluding the validity of the experiment is uncomplicated.

Explore bigger solutions to achieve the best possible outcome

Once the MVE has been proven, this justifies investing further resource in exploring bigger solutions. Iterating on experiments allows you to refine solutions to achieve the best possible execution of the hypothesis.

Key takeaways

  • A minimum viable experiment involves testing a hypothesis in its simplest form, allowing you to validate concepts early on and optimise the execution via iterations.
  • Push back on MVEs are usually due to a lack of awareness of the process and benefits they yield. Educate in order to show teams how effective this type of testing is, not only in gaining the best possible final execution for tests but also in utilising resource with efficiency.
  • The main benefit of the minimum viable approach is that you spend time and resource on levers that impact your KPIs.

SCORE: A dynamic prioritisation framework for AB tests from

Why prioritise?

With experimentation and conversion optimisation, there is never a shortage of ideas to test.

In other industries, specialist knowledge is often a prerequisite. It’s hard to have an opinion on electrical engineering or pharmaceutical research without prior knowledge.

But with experimentation everyone can have an opinion: marketing, product, engineering, customer service – even our customers themselves. They can all suggest ideas to improve the website’s performance.

The challenge is how you prioritise the right experiments.

There’s a finite number of experiments that we can run – we’re limited both by the resource to create and analyse experiments, and also the traffic to run experiments on.

Prioritisation is the method to maximise impact with an efficient use of resources.

Prioritisation is the method to maximise impact with an efficient use of resources.

Where most prioritisation frameworks fall down

There are multiple prioritisation frameworks – PIE (from WiderFunnel), PXL (from ConversionXL), and more recently the native functionality within Optimizely’s Program Management.

Each framework has a broadly consistent approach: prioritisation is based on a combination of (a) the value of the experiment, and (b) the ease of execution.

WiderFunnel’s PIE framework uses three factors, scored out of 10:

  • potential (how much improvement can be made on the pages?)
  • importance (how valuable is the traffic to the page?) and
  • ease (how complicated will the test be to implement?)

This is effective: it ensures that you consider both the potential uplift from the experiment alongside the importance of the page. (A high impact experiment on a low value page should rightfully be deprioritised.)

But it can be challenging to score these factors objectively – especially when considering an experiment’s potential.

Conversion XL’s PXL framework looks to address this. Rather than asking you to rate an experiment out of 10, it asks a series of yes/no questions to objectively assess its value and ease.

Experiments that are above the fold and based on quantitative and qualitative research will rightly score higher than a subtle experiment based on gut instinct alone.

This approach works well: it rewards the right behaviour (and can even help drive the right behaviour in the future, as users submit concepts that are more likely to score well).

But while it improves the objectivity in scoring, it lacks two fundamental elements:

  1. It accounts for page traffic, but not page value. So an above-the-fold research-backed experiment on a zero-value page could be prioritised above experiments that could have a much higher impact. (We used to work with a university in the US whose highest-traffic page was a blog post on ramen noodle recipes. It generated zero leads – but the PXL framework wouldn’t account for that automatically.)
  2. While it values qualitative and quantitative research, it doesn’t appear to include data from the previous experiments in its prioritisation. We know that qualitative research can sometimes be misleading (customers may say one thing and do something completely different). That’s why we validate our research with experimentation. But in this model, its focus is purely on research – whereas a conclusive experiment is the best indicator of a future iteration’s success.

Moreover, most frameworks struggle to adapt as an experimentation programme develops. They tend to work in isolation at the start – prioritising a long backlog of concepts – but over time, real life gets in the way.

Competing business goals, fire-fighting and resource challenges mean that the prioritisation becomes out-of-date – and you’re left with a backlog of experiments that is more static than a dynamic experimentation programme demands.

Introducing SCORE –’s prioritisation process

Our approach to prioritisation is based on more than 10 years’ experience running experimentation programmes for clients big and small.

We wanted to create an approach that:

  • Prioritises the right experiments: So you can deliver impact (and insight) rapidly.
  • Adapts based on insight + results: The more experiments you run, the stronger your prioritisation becomes.
  • Removes subjectivity: As far as possible, data should be driving prioritisation – not opinion.
  • Allows for the practicalities of running an experimentation programme: It adapts to the reality of working in a business where the wider priorities, goals and resources change.

But the downside is that it’s not a simple checklist model. In our experience, there’s no easy answer to prioritisation – it takes work. But it’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.

It’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.

With that in mind, we’re presenting SCORE –’s prioritisation process:

  • Strategy
  • Concepts
  • Order
  • Roadmap
  • Experimentation

As you’ll see, the prioritisation of one concept against each other happens in the middle of the process (“Order”) and is contingent on the programme’s strategy.

Strategy: Prioritising your experimentation framework

At, our experimentation framework is fundamental to our approach. Before we start on concepts, we first define the goal, KPIs, audiences, areas and levers (the factors that we believe affect user behaviour).

You can read more about our framework here and you can create your own with the templates here.

When your framework is complete (or, at least, started – it’s never really complete), we can prioritise at the macro level – before we even think about experiments.

Assuming we’ve defined and narrowed down the goal and KPIs, we then need to prioritise the audiences, areas and levers:


Prioritise your audiences on volume, value and potential:

  • Volume – the monthly unique visitors of this audience. (That’s why it’s helpful to define identifiable audiences like “prospects”, “users on a free trial”, “new customers”, and so on.)
  • Value – the revenue or profit per user. (Continuing the above example, new customers are of course worth more than prospects – but at a far lower volume.)
  • Potential – the likelihood that you’ll be able to modify their behaviour. On a retail website, for example, there may be less potential to impact returning customers than potential customers – it may be harder to increase their motivation and ability to convert relative to a user who is new to the website.

You can, of course, change the criteria here to adapt the framework to better suit your requirements. But as a starting point, we suggest combining the profit per user and the potential improvement.

Don’t forget, we want to prioritise the biggest value audiences first – so that typically means targeting as many users as possible, rather than segmenting or personalising too soon.


In much the same way as audiences, we can prioritise the areas – the key content that the user interacts with.

For example, identify the key pages on the website (homepage, listings page, product page, etc) and score them on:

  • Volume – the monthly unique visitors for the area.
  • Value – the revenue or profit from the area.
  • Potential – the likelihood that you’ll be able to improve the area’s performance. (Now’s a good time to use your quantitative and qualitative research to inform this scoring.)

(It might sound like we’re falling into the trap of other prioritisation models: asking you to estimate potential, which can be subjective. But, in our experience, people are more likely to score an area objectively, rather than an experiment that they created and are passionate about.)

Also, this approach doesn’t need to be limited to your website. You can apply it to any other touchpoint in the user journey too – including offline. Your cart abandonment email, customer calls and Facebook ads can (and should) be used in this framework.

If your KPI is profit, you may want to include offline content like returns labels in prioritisation model.
If your KPI is profit, you may want to include offline content like returns labels in prioritisation model.


As above, levers are defined as the key factors or themes that you think affect an audience’s motivation or ability to convert on a specific area.

These might be themes like pricing, trust, delivery, returns, form usability, and so on. (Take another look at the experimentation framework to see why it’s important to separate the lever from the execution.)

When you’re starting to experiment, it’s hard to prioritise your levers – you won’t know what will work and what won’t.

That’s why you can prioritise them on either:

  • Confidence – a simple score to reflect the quantitative and qualitative research that supports the lever. If every research method shows trust as a major concern for your users, it should score higher than another lever that only appears occasionally.
  • Win rate – If you have run experiments on this lever in the past, what was their win rate? It’s normally a good indicator of future success.

Of course, if you’re starting experimentation, you won’t have a win rate to rely on (so estimating the confidence is a fantastic start).

But if you’ve got a good history of experimentation – and you’ve run the experiments correctly, and focused them on a single lever – then you should use this data to inform your prioritisation here.

Again, the more we experiment, the more accurate this gets – so don’t obsess over every detail. (After all, it’s possible that a valid lever may have a low win rate simply because of a couple of experiments with poor creative.)  

Putting this all together, you can now start to prioritise the audiences, areas and levers that should be focused on:

As you can see, we haven’t even started to think about concepts and execution – but we have a strong foundation for our prioritisation.

Concepts: Getting the right ideas

After defining the strategy, you can now run structured ideation around the KPIs, audiences, areas and levers that you’ve defined.

This creates the ideal structure for ideation.

Rather than starting with, “What do we want to test?” or “How can we improve product pages?”, we’re instead focusing on the core hypotheses that we want to validate:

  • How can we improve the perception of pricing on product pages for new customers?
  • How can we overcome concerns around delivery in the basket for all users?
  • And so on.

This structured ideation around a single hypothesis generates far better ideas – and means you’re less susceptible to the tendency to throw everything into a single experiment (and not knowing which part caused the positive/negative result afterwards).

Order: Prioritising the concepts

When prioritising the concepts – especially when a lever hasn’t been validated by prior experiments – you should look to start with the minimum viable experiment (MVE).

Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis. (Can we test a hypothesis with 5 hours of development time rather than 50?)

Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis.

This is a hugely important concept – and one that’s easily overlooked. It’s natural that we want to create the “best” iteration for the content we’re working on – but that can limit the success of our experimentation programme. It’s far better to run ten MVEs across multiple levers that take 5 hours each to build, rather than one monster experiment that takes 50 hours to build. We’ll learn 10x as much, and drive significantly higher value.

In one AB test for a real estate client, we created a fully functional “map view”. It was based on a significant volume of user research – but the minimum viable experiment would have been simply to test adding a “Map view” button without the underlying functionality.
In one AB test for a real estate client, we created a fully functional “map view”. It was based on a significant volume of user research – but the minimum viable experiment would have been simply to test adding a “Map view” button without the underlying functionality.

So at the end of this phase, we should have defined the MVE for each of the high priority levers that we’re going to start with.

Roadmap: Creating an effective roadmap

There are many factors that can affect your experimentation roadmap – factors that stop you from starting at the top of your prioritised list and working your way down:

  • You may have limited resource, meaning that the bigger experiments have to wait till later.
  • There may be upcoming page changes or product promotions that will affect the experiment.
  • Other teams may be running experiments too, which you’ll need to plan around.

And there are dozens more: resource, product changes, marketing, seasonality can all block experiments – but shouldn’t block experimentation altogether.

That’s why planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of external factors.

Planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of internal factors.

To plan effectively:

  • Identify your swimlanes: These are the audiences and areas from your framework that you’ll be experimenting on. (Again, make sure you focus on the high priority audiences and areas – don’t be tempted to segment or personalise too early.)
  • Estimate experiment duration: Use an appropriate minimum detectable effect for the audience and area to calculate the duration, then block out this time in the roadmap.
  • Experiment across multiple levers: Gather more insight (and spread your risk) by experimenting across multiple levers. If you focus heavily on a lever like “trust” with your first six experiments, you might have to start again if the first two or three experiments aren’t successful.

Experimentation: Running and analysing the experiments

With each experiment, you’ll learn more about your users: what changes their behaviour and what doesn’t.

You can scale successful concepts and challenge unsuccessful concepts.

For successful experiments, you can iterate by:

  • Moving incrementally from minimum viable experiments to more impactful creative. (With one client, we started with a simple experiment that promoted the speed of delivery. After multiple successful experiments around delivery, we eventually worked with the client to test the commercial viability of same-day delivery.)
  • Applying the same lever to other areas and potentially audiences. If amplifying trust messaging on the basket page works well, it’ll probably work well on listing and product pages too.

Meanwhile, an experiment may be unsuccessful because:

  • The lever was invalidated – Qualitative research may have said customers care about the lever, but in practice makes no difference.
  • The execution was poor – It happens sometimes. Every audience/area/lever combination can have thousands of possible executions – you won’t get it right first time, every time, and you risk rejecting a valid lever because of a lousy experiment.
  • There an external factor – It’s also possible that other factors affected the test: there was a bug, the underlying page code changed, a promotion or stock availability affected performance. It doesn’t happen often, but it needs to be checked.

In experiment post-mortems, it’s crucial to investigate which of these is most likely, so we don’t reject a lever because of poor execution or external factors.

Conduct experiment post-mortems so you don’t reject a lever because of poor execution or external factors.

What’s good (and bad) about this approach

This approach works for – we’ve validated it on clients big and small for more than ten years, and have improved it significantly along the way.

It’s good because:

  • It’s a structured and effective prioritisation strategy.
  • It doesn’t just reward data and insight – it actively adapts and improves over time.
  • It works in the real-world, allowing for the practicalities of running an experimentation programme.

On the flip side, its weaknesses are that:

  • It takes time to do properly. (You should create and prioritise your framework first.)
  • You can’t feed in 100 concepts and expect it to spit out a nicely ordered list. (But in our experience, you probably don’t want to.)

So, what now?

  1. If you haven’t already, print out or copy this Google slide for’s experimentation framework.
  2. Email to join our mailing list. We like sharing how we approach experimentation.
  3. Share your feedback below. What do like? What do you do differently?