Brands increasingly understand the importance of experimentation to constantly improve and optimise their digital performance.
However, creating this culture of experimentation, where failure is seen as something to learn from can be difficult. How brands can build this culture and drive improved performance, was the focus of our recent joint event with Kameleoon at Sea Containers House.
Exploring the topic with a packed room were James Gray of Facebook, Marianne Stjernvall of TUI and Conversion’s Stephen Pavlovich. In a wide ranging discussion that moved from the 18th to the 21st centuries, and from finding the cure to scurvy to the Amazon Echo, the speakers explained hints, tips and case studies for building a strong experimentation culture.
Experiments that lose, build a better business case than the ones that win!
Starting the evening, Stephen Pavlovich explained his best practices for experimentation, starting with the importance of recognising and celebrating failure, rather than simply sweeping it under the carpet.
Too many organisations only talk about winning experiments, rather than learning from the losers. This leads to poor practice as losing experiments can often outweigh the value of the winning ones. Take Amazon’s disastrous Fire smartphone, which failed to sell and cost the company $170 million. Much of the technology was repackaged into the Amazon Echo smart speaker, creating an incredibly successful product and channel for the company.
Next Stephen addressed the issue of getting buy-in for experimentation, either by scaling experiments up to address bigger issues (push), or scaling something big down to make it more manageable (pull). This helps spread experimentation across the business and directly supports a culture where more radical ideas are tested. As he concluded, “We shouldn’t just test what we were going to do anyway. We should experiment on our boldest ideas – with a safety net.”
Getting it wrong…
Facebook is often highlighted as a leader in experimentation. However, as James Gray, Growth Marketing Manager in Facebook’s Community Integrity division explained, you still need to follow a structured and sensible process in order to drive optimal results. He highlighted three areas to focus on:
Focus on the basics: Make sure that experiments are run for long enough to deliver realistic results (two business cycles is a logical timeframe), that you are choosing (and sharing) the right metrics to measure the experiment BEFORE it starts and that you have a process to widely share the results.
Adapt to your terrain: Every business is different, so make sure your experiments reflect the idiosyncrasies of your organisation. Cater for factors that might either impact your results or mean that results don’t always translate into the promised benefits down the line.
Navigate your organisation: To get your results accepted and acted upon you need to bring the rest of the organisation with you. That means winning people over by taking the time to explain what you are doing and why your results are important. James shared the case of 18th century doctor John Lind who tested different potential treatments for scurvy amongst sailors. While his experiment successfully identified citrus fruit as the cure, the naval authorities took 42 years to put this finding into practice – primarily because it was not clearly communicated in ways that these non-specialists could understand.
Creating an experimentation culture at TUI
Large, traditional businesses often find it hard to change to a culture of experimentation, held back by organisational structures and a more conservative way of operating. Demonstrating that this doesn’t need to be the case, Marianne Stjernvall, CRO Lead at TUI, outlined six ways to get your culture of experimentation off the ground, based on her experience at the international travel group.
Be the hero: Have a clear plan on what you want to do, introduce the idea of testing, and listen to stakeholders to get them onside. Remember that humans are hard-wired to be suspicious of change, so not everyone will be positive from the start.
Run masterclasses: Show people what you are passionate about with regular high-level, fun sessions that are open to all. Keep running them and you’ll draw in more people from across the business that you otherwise would never get to speak to.
Share the results: Hold meetings with key stakeholders to drill down into your results. Be sure they understand the data and what it means before increasing the level of your detail and transparency through tools such as dashboards.
What I Learnt This Week: Weekly team meetings where everyone has the chance to share what they have done, and what has resulted from it. Make it user and people focused.
CRO Showcase: Highlight every learning through regular calls with your team, particularly if you are spread geographically in different countries.
Monthly dashboard: Introduce an element of (friendly) competition by not only reporting on all tests and their impact on business metrics, but highlighting the test that performance best of all. What can be learned and applied elsewhere?
The evening finished with an interactive panel Q&A, covering topics as diverse as “most embarrassing test”, to what to do if your test proves inconclusive.
If you’d like to learn more about how Conversion can help you use experimentation to drive growth in your business, get in touch today. If you’d like to find out more about Kameleoon’s A/B testing and experimentation platform, then visit their website here.
Living in a world on fast forward is no easy thing.
Dramatic transformation takes place in every aspect of our lives, every industry and every sector. Staying in the game is posing new challenges for businesses across the globe. And in-store retail is particularly vulnerable.
This blog focuses on explaining the challenges retailers are facing and why experimentation could be the path for success in the new world of customer experience.
The current state of retail
After the Great Recession in 2008-09, a new era started in the consumer market. A golden age of consumption driven by the dazzling evolution of technology. Anyone with a mobile phone and an internet connection got access to virtually any product or service.
As a consequence, the previously almost non-existent online channel became absolutely crucial for most brands. E-commerce sales rocketed and expectations are that this trend will continue.
However, in the offline world, things became less pink. With online taking up more and more market share, alongside fierce competition, polarising economics, high rents or rising taxes, led to more and more retailers cutting their losses and pulling down the shutters for the last time.
Despite the headlines showing a gloomy perspective, more than 80% of the sales still come from stores. A FirstInsight report shows that consumers spend more time and money when purchasing in-store compared to online.
So then what is going to happen to the brick and mortar?
The transition from sales/sq foot to experience/sq foot
As the uprise of the online will continue, there is only one way the stores will be successful – brands must embrace the paradigm shift that is taking place. Simply distributing products is no longer enough, as consumers today are not relying on stores as their sole means of access to goods. The reason consumers still go to stores is to engage with the brand, its products, and culture in an emotional way that cannot be replicated online. It’s the experience that drives them in.
With more and more sales being attributed to mobile, social and online channels, continuing to measure success only by the conventional sales per square foot will indicate diminishing productivity for stores. This, in turn, will lead to more closures and missing out on a huge opportunity: capitalising on the in-store strategic importance to delivering powerful, unique experiences that create engagement and loyalty with a brand.
So, how do we measure this new goal of optimising the customer experience?
Alongside the traditional ROI metrics, brands need to define their ROX (return on experience) KPIs. The first step is mapping consumers’ purchase journey and isolating the touch points and factors that drive experience. Then, invest in the parts of the company that will move the needle on those interactions and yield measurable results. It’s all about the ‘magic moments’.
But, unfortunately there’s not much that has been done so far to define them. The well-known Net Promoter Score, which as many as two-thirds of Fortune 1000 companies use, is based on a single question concerning whether the consumer would recommend a product, service or company. Many companies hire vendors to track customer experiences with call centres and web properties.
Coming up with good KPIs to optimise for ROX will probably be one of the milestones for the new retail industry in the upcoming years.
Delivering in-store customer experience through experimentation
So, why is experimentation the answer?
In the online world, user experience has increasingly become the focus of brands over recent years. We now have UX design teams trying to create products that provide meaningful and relevant experiences to users, and experimentation programmes that are letting us know how a change would impact KPIs before implementing it. More and more online businesses rely on experimentation to inform their decisions and strategy and improve CX (customer experience).
However offline, things are quite different. Management usually introduces ‘innovations’ aimed at improving revenue without analysing data and understanding what difficulties customers are facing or where the opportunity lies. Sometimes they get it right, at other times, it results in catastrophic failure.
One example of things going wrong is that of J.C.Penney. In 2011 Ron Johnson, Apple’s senior VP, became J.C.Penney’s CEO. Without any testing, he immediately introduced a change that decimated Penney’s revenue. He introduced technology, which took over the cashiers’ duties. This made the checkout experience too formal. All forms of clearance sales and discounts stopped. Also, high-priced branded products became prominent in the stores. It took only a year and five months after these changes for J.C.Penney’s sales revenue to crash to an all-time low. Management reacted, relieved Johnson of his position as CEO and went ahead to reverse the changes.
Blockbuster, the multibillion-dollar video rental giant, is another example. The company faced decreasing popularity in 2000 because of unreasonable ‘late’ fees customers had to pay if they didn’t return their rented movies in time. One of the proposed solutions for the problem was to run a simple and cheap ($12,000 only) email-as-a-reminder experiment. But this was rejected on the basis that it’s not a ‘grand strategy’. Too simple. Instead, they came up with a transformative, big-budget effort to eliminate fees backed by a huge advertising campaign. The plan proved to be both unpopular and, ultimately, illegal. The estimated legal and business costs of this ‘extended viewing fee’ fiasco exceeded tens of millions of dollars.
And there are plenty of similar examples. The conclusion – this is not a game of chance. As the era of ‘bricks and clicks’ and omnichannel is upon us, it’s time for the online techniques to be reflected offline and experimentation put to good use.
Surely everyone should be experimenting in-store then?
Testing in-store does pose significantly more challenges than online. And the main reason for that is organisational and technical complexities.
Having multiple store branches in several locations adds to the difficulty and complexity of experimentation. While online tests give access to a large pool of consumers from which companies can do a random sampling of any number they want, this type of sampling is not possible with brick and mortar. Carrying out experiments on thousands of store locations is simply not an option.
This challenge usually causes a company to carry out tests on only a small number of its customers which is not representative of the majority.
To make the most out of the in-store experimentation programmes, strategy, as well as it’s execution becomes crucial. Strategy identifies the goal of the programme, defines how success should be measured and uses data to inform hypotheses. Furthermore, commitment from all stakeholders involved is essential, as well as throughout feasibility and reliability investigations when it comes to experiment design and results analysis.
Where do we stand at the moment?
In-store experimentation, also referred to as business experimentation, is scarcely used nowadays, although it’s potential for driving customer experience, informing business decisions and strategy is massive. There are, however, quite a few examples that can build up a strong case in favour.
One example is Kohl’s – one of the largest department store retail chains in America, with 1,158 locations across the country. Back in 2013, someone suggested a one-hour delay in the opening of stores from Mondays to Saturdays to reduce operational costs. This threw up a strong debate within the company’s management, and they decided the only way to know what was right was to subject the idea to a comprehensive business experiment. After conducting a test across 100 stores, the result showed that a delay in store opening time would not have any serious negative impact on sales.
Another example is Wawa, the convenience store chain in the mid-Atlantic United States. They wanted to introduce a flatbread breakfast item that had done well in spot tests. But the initiative was killed before the launch when a rigorous experiment—complete with test and control groups followed by regression analyses—showed that the new product would likely cannibalise other more profitable items.
While ROI is everyone’s focus, ROX as a goal is relatively new, and mainly the leading brands are the ones starting to understand it and make it part of their company’s strategy. There are a few factors (here at Conversion.com, we call them levers) that proved driving in-store customers experience as well as ROI:
Technology is the one that stands out. In an era of fast technological advancements, there is a plethora of options for retailers to choose from to take their customer experience to the next level.
The first example that comes to mind is Amazon and their Amazon GO shops where the world’s most advanced shopping technology turn lines and checkout into history. Computer vision, sensor fusion, and deep learning automatically detect when products are taken from or returned to the shelves and keep track of them in a virtual cart. When customers are done shopping, they can just leave the store. A little later, a receipt is sent, and customers’ Amazon account charged.
Similarly, Sainsbury’s experimented with improving their customers’ experience by opening a till-free store in London back in April. This way, customers didn’t have to wait in long lines, and could self-checkout simply by scanning the products with their phone, after installing an APP. You can read more about the experiment here. The results of this experiment are still yet to be announced.
Employees are a powerful factor to drive ROX. However, research suggests that retailers tend to view store associates as an expense to be controlled rather than as a medium to provide better service for customers.
A randomized controlled experiment was run in 28 Gap stores in San Francisco Bay Area and Chicago in 2015 by an interdisciplinary team led by Principal Investigator Joan C. Williams from the University of Chicago. For the experiment, retail associates were shifted to more-stable schedules to see how that would impact the sales and work productivity.
The results were striking.
Sales in stores with more stable scheduling increased by 7%, an impressive number in an industry in which companies work hard to achieve increases of 1–2%. Labour productivity increased by 5% in an industry where productivity grew by only 2.5% per year between 1987 and 2014. The estimate is that Gap earned $2.9 million as a result of more-stable scheduling during the 35 weeks the experiment was in the field. All details about the experiment can be found here.
Nevertheless, the best to exemplify the impact employees can have on ROX are the iconic Apple stores. They rely on a very effective communication technique adapted from The Ritz-Carlton – Steps of Service. Every employee is trained to walk a customer through five steps that spell out the acronym A-P-P-L-E:
A – Approach customers with a personalized, warm welcome.
P – Probe politely to understand the customer’s needs.
P – Present a solution for the customer to take home today.
L – Listen for and resolve issues or concerns.
E- End with a fond farewell and an invitation to return.
In-store design, fixtures, and facilities also play a significant role in customer experience. Proving that they understand how brick and mortar retail is changing in the age of e-commerce, Nike opened a new five-story, 55,000 square foot store in New York City. There is a mini indoor basketball court, a treadmill, a system that simulates runs in different locations, a small soccer enclosure, a shoe bar where shoppers can personalise a pair of Nike Air Force and coaches who put customers through drills to test out different pairs of shoes. It is as much a place to play as it is a place to shop.
To conclude, in a market with customers’ expectations higher and more dynamic than ever, businesses have a powerful instrument in their toolkit to help them understand and meet these expectations – experimentation.
Putting experimentation at the heart of a business, not only leads to better and more innovative ways of doing things – but actually gives companies the confidence to overturn wrongheaded conventional wisdom, and the faulty business intuition that even seasoned executives still inhabit.
To find out more about our approach to experimentation, get in touch today!
Last week, we partnered up with Amplitude to host a compelling evening focused on product experimentation at the beautiful Century Club in London.
We were delighted to be accompanied by some brilliant speakers. We welcomed on stage Veronica Hulea (Head of Analytics at Zoopla), Rob Beattie (Head of Digital Product at Domino’s Pizza) and our very own, Stephen Pavlovich (CEO and Founder of Conversion.com).
Bringing together product practitioners and leaders from a range of different brands and industries, we wanted to share just how businesses should be using experimentation to not only inform product, but to actually define their roadmap.
We kicked off the evening with Stephen Pavlovich introducing the audience to experimentation as an engine to a successful product roadmap. He talked about how our choices are too often defined by ‘position, authority and experience’ or even ‘gut feeling.’ And, when we operate as teams, the products tend to be even worse. Or, to put it in Stephen’s terms…
“Decisions by committee will always be shitty.”
Instead, Stephen suggested that the most successful companies use experimentation as a product development framework. Using experimentation not just to validate your ideas – but to define them, means you can test bolder ideas safely, creating better products for your customers. This is exactly how the likes of Facebook, Amazon and Uber work – with experimentation at the heart of their businesses.
Finally, Stephen shared his five principles of product experimentation:
Next up on the stage, we welcomed our first guest speaker, Veronica Hulea, Head of Analytics at Zoopla. Veronica holds years of experience in market analytics as well as product optimisation. She shared her insights on how you can evolve your product without killing your conversion rates.
She began with examples from her own experience with Zoopla when they attempted to re-platform whilst maintaining a stable conversion rate.
“Use AB testing to ‘bake’ the new design with a small percentage of users, until it’s ready to replace the old one.”
Veronica also explained why AB test uplifts are not reflected in business metrics. She provided an actionable insight on how to unlock potential based on the level of intent of the user – from the browsing and researching, all the way through to final conversion.
Last but definitely not least, we introduced our guests to Rob Beattie, Head of Digital Product at Domino’s Pizza. Rob has been in the company for a year and a half now, and also has numerous years of experience and knowledge heading up digital product and transformation across different businesses.
Rob took us on a journey through the years of growth and innovation at Domino’s Pizza, and showed us how experimentation has been used to inform the successes so far.
He continued by sharing the role of experimentation in the business as being not only a way to sell more products and develop new features online, but to actually define their physical products as well.
Rob provided actionable insights on ‘what makes a good experiment’, and equally as important, ‘how to run an experiment well’. Finally, our audience got to hear what the future holds for Domino’s Pizza, and just how ambitious their roadmap is!
Following the brilliant lightning talks, we held a panel Q&A where our guests took the opportunity to ask a myriad of questions about experimentation in general and specifically within their businesses.
If you’d like to hear more about how you can use experimentation to inform your product roadmap and drive growth in your business, then get in touch today.
Decision making is part of our everyday lives. We ask ourselves, “Should I have a coffee or a tea?Should I take the bus or the tube today?How should I respond to this email?”
But are we really aware of just how many decisions the average human makes in just one day? Go on have a guess…
On average, we make a staggering 35,000 decisions per day! Taking into account the 8 or so hours we spend asleep, that works out to be over 2,100 decisions per hour. If we thought consciously about each decision, we would be faced with a debilitating challenge that would prevent us from living out our normal lives. Thankfully our brains have developed shortcuts, called heuristics, which allow us to make judgements quickly and efficiently, simplifying the decision making process.
Heuristics are extremely helpful in many situations, but they can result in errors in judgement when processing information – this is referred to as a cognitive bias.
How can cognitive biases impact our decisions?
Cognitive biases can lead us to false conclusions and as a consequence influence our future behaviour.
In order to illustrate this I am going to take you through a famous study conducted by Daniel Kahneman showing the impact of the anchoring bias. In Kahneman’s experiment, a group of judges with over 15 years’ experience each were asked to look at a case in which a woman had been caught shoplifting multiple times.
In between reviewing the case and suggesting a possible sentence, the judges were asked to roll a pair of dice. Unbeknown to the judges this was the “anchor”. The dice were rigged, and would either give a total of 3 or 9.
Astonishingly, the number rolled anchored the judges when making their sentencing recommendations. Those who rolled 3 sentenced the woman to an average of 5 months in prison; those who threw 9 sentenced her to 8 months.
If judges with 15 years’ experience can be influenced so easily by something so arbitrary about something so important – then what hope do the rest of us have?
Another example of biases impacting important decisions can be found in the Brexit campaigns. We can all remember the “£350 million a week” bus, which suggested that instead of sending that money to the EU we could use it to fund the NHS instead.
There were many other examples of false stories published in the British media. These shocking statements are influential because humans have a tendency to think that statements that come readily to mind are more concrete and valid. This is an example of the availability bias.
But how is this relevant for experimentation?
With experimentation, we are tasked with changing the behaviours of users to achieve business goals. The user is presented with a situation and stimuli that impact their emotional responses and dictate which cognitive biases affect the user’s decision making.
When we run experiments without taking this into account we are superficially covering up problems and not looking at the root causes. In order to truly change behaviour we must change the thought process of the user. This is where our behavioural bias framework comes into play…
Step 1. Ensure you have established your goal. Without a goal you will not be able to determine the success of your experiments.
Step 2. Identify the target behaviours that need to occur in order to achieve your goal. At this point it is important to analyse the environment you have created for your users. What stimulus is there to engage them? What action does the user need to take to achieve the goal? Is there a loyal customer base that return and carry out the desired actions again and again?
Step 3. Identify how current customers behave. Is there a gap between current behaviours and target behaviours?
Step 4. Now start pairing current negative biases with counteracting biases. At this point research is imperative. Your customers will behave differently depending on their environment, social and individual contexts. Research methods you can use include surveys, moderated and unmoderated user testing, evidence from previous tests as well as scientific research. Both Google Scholar and Deep Dyve are excellent scientific research resources.
Step 5. Which is the best solution to test?
There are three important things to consider at this point.
Value – What is the return for the business? Volume – How many visitors will you be testing? Evidence – Have you proven value in this area in previous tests?
Joining the dots.
To bring this framework to life I’m going to run through an example…
Let’s pretend I work for a luxury food brand. I have identified my target goal which is purchases and mapped out how my current users behave on the site. I find that users are exiting the site when they are browsing product pages. Product pages are one of our highest priority areas.
I have conducted a website review which flagged some negative customer reviews. This is not a big issue for us, after all we are reliant on individual taste and we have an abundance of positive reviews. Nevertheless, it seems to be a sticking point for users.
A potential bias at play causing users to exit is the negativity bias. This bias tells us that things of a negative nature have a greater impact than neutral or positive things.
Instead of removing the negative reviews we are going to maintain the brands openness to feedback and leave them onsite. Nevertheless, we still want to reduce exit rate so we are going to test a counteracting bias, the visual depiction effect.
The visual depiction bias states that people are more inclined to want to buy a product when it is shown in a way which helps them to visualise themselves using it. So in our product images we will now add in a fork (this study was actually conducted! Check it out).
The results from the experiment will determine whether our counteracting bias (visual depiction effect) overcame the current one (negativity bias).
So, to conclude…the behavioural bias framework should be used to understand the gap between your customers’ current behaviours and your intended goal. This will allow you to hypothesise potential biases at play and run experiments that bridge the gap between existing and aspirational behaviours.
To find out more about our approach to experimentation, get in touch today!
We disagree – and want to show how this approach can actually limit the effectiveness of your experimentation programme.
But first… what is a primary metric?
Your primary metric is the metric you will use to decide whether the experiment is a winner or not.
We also recommend tracking:
Secondary metrics – to gain more insight into your users’ behaviour
Guardrail metrics – to ensure your test isn’t causing harm to other important business KPIs.
So what’s the big debate?
Some argue that your primary metric should be the next action you want the user to take, not final conversion.
For example, on a travel website selling holidays, the ‘final conversion’ is a holiday booking – this is the ultimate action you want the user to take. However, if you have a test on a landing page, the next action you want the user to take is to click forward into the booking funnel.
The main motive for using the next action as your primary metric is that it will be quicker to reach statistical significance. Moreover, it is less likely to give an inconclusive result. This is because:
Inevitably more users will click forward (as opposed to making a final booking) so you’ll have a higher baseline conversion rate, meaning a shorter experiment duration.
The test has a direct impact on click forward as it is the next action you are persuading the user to take. Meanwhile there may be multiple steps between the landing page and the final conversion. This means many other things could influence the user’s behaviour, creating a lot of noise.
There could even be a time lag. For example, if a customer is looking for a holiday online, they are unlikely to book in their first session. Instead they may have a think about it and have a couple more sessions on the site before taking the final step and converting.
Why is the myth wrong?
Because it can lead you to make the wrong decisions.
Example 1: The Trojan horse
Take this B2B landing page below: LinkedIn promotes their ‘Sales Navigator’ product with an appealing free trial. What’s not to like? You get to try out the product for free so it is bound to get a high click through rate.
But wait…when you click forward you get a nasty shock as the site asks you to enter your payment details. You can expect a high drop-off rate at this point in the funnel.
A good idea would be to test the impact of giving the user forewarning that payment details will be required. This is what Norton Security have under the “Try Now” CTA on their landing page.
In an experiment like this, it is likely that you would see a fall in click through (the ‘next action’ from the landing page). However, you might well see an uplift in final conversion – because the user receives clear, honest, upfront communication.
In this LinkedIn Sales Navigator example:
If you were to use clicks forward as your primary metric, you would declare the test a loser, despite the fact that it increases conversion.
If you were to use free trial sign ups as your primary metric, you would declare the test a winner – a correct interpretation of the results.
Example 2: The irresistible big red button
The ‘big red button’ phenomenon in another scenario that will help to bust this troublesome myth:
When you see a big red button, all you want to do is push it – it’s human nature.
This concept is often taken advantage of by marketers:
Imagine you have a site selling experience gifts (e.g. ‘fine dining experience for two’ or ‘one day acrobatics course’). You decide to test the increasing prominence of the main CTA on the product page. You do this by increasing the CTA size and removing informational content (or moving it below the fold) to remove distractions. Users might be more inclined to click the CTA and arrive in the checkout funnel. However, this could damage conversion. Users may click forward but then find they are lacking information and are not ready to be in the funnel – so actual experience bookings may fall.
Again, in this scenario using click forward as your primary metric will lead you to the wrong conclusions. Using final conversion as your primary metric aligns with your objective and will lead you to the correct conclusions.
There are plenty more examples like these. And this isn’t a made-up situation or a rare case. We frequently see an inverse relationship between clickthrough and conversion in experimentation.
This is why PPC agencies and teams always report on final conversion, not just click through to the site. It is commonly known that a PPC advert has not done its job simply by getting lots of users to the site. If this was the case you would find your website inundated with unqualified traffic that bounces immediately. No – the PPC team is responsible for getting qualified traffic to your site, which they measure by final conversion rate.
But is it really a big deal?
Some people say, ‘Does it really matter? As long as you are measuring both the ‘next action’ and the final conversion then you can interpret the results depending on the context of the test.’
That’s true to some extent, but the problem is that practitioners often interpret results incorrectly. Time and time again we see tests being declared as winners when they’ve made no impact on the final conversion – or may have even damaged it.
Why would people do this? Well, there is a crude underlying motive for some practitioners. It makes them look more successful at their job – with higher win rates and quicker results.
And there are numerous knock on effects from this choice:
When an individual declares a test as a winner incorrectly, the test will need to get coded into the website. This will be added to the development team’s vast pile of work. A huge waste of valuable resources when the change is not truly improving the user experience and may well be harming it.
2. Reducing learnings
Using next action as your primary metric often leads to incorrect interpretation of results. In turn, this leads to missing out vital information about the test’s true impact in communications. Miscommunication of results means businesses miss out on valuable insights about their users.
Always question your results to increase your understanding of your users. If you are seeing an uplift in the next action, ask yourself, ‘Does this really indicate an improvement for users? What else could it indicate?’ If you are not asking these questions, then you are testing for the sake of it rather than testing to improve and learn.
3. Sacrificing ROI
With misinterpreted results, you may sacrifice the opportunity to iterate and find a better solution that will work. Instead of implementing a fake winner, iterate, find a true winner and implement that!
Moreover, you may cut an experiment short, having seen a significant fall in next step conversion. Whereas if you had let the experiment run for longer, it could have given a significant uplift in final conversion. Declaring a test a loser when it is in fact a winner will of course sacrifice your ROI.
4. Harming stakeholder buy-in
On the surface, using click-through as your primary metric may look great when reporting on your program metrics. It will give your testing velocity and win rate a nice boost. But it doesn’t take long, once someone looks beneath the surface, to see that all your “winners” are not actually impacting the bottom line. This can damage stakeholder buy-in, as your work is all assumptive rather than factual and data-driven.
But it’s so noisy!
A common complaint we hear from believers of the myth is that there is too much noise we can’t account for. For example, there might be 4 steps in the funnel between the test page and the final conversion. Therefore, there are so many other things that may have influenced the user in the time between step 1 and step 4 that could lead them to drop off.
That’s true. But the world is a noisy place. Does that mean we shouldn’t test at all? Of course not.
For instance, I might search “blue jacket” and Google links me through to an ASOS product page for their latest denim item. Between this page and the final conversion we have 3 steps: basket, sign in, checkout.
Look at all the noise that could sway my decision to purchase along each step of the journey:
As you can see there is a lot of unavoidable noise on the website and a lot of unavoidable noise external to the site. Imagine ASOS were to run a test on the product page and were only measuring the next action (“add to basket” clicks). Their users are still exposed to a lot of website noise and external noise during this first step.
However, one thing is for sure: all users will face this noise, regardless of whether they are in the control or the variant. As the test runs, the sample size will get larger and larger, and the likelihood of seeing a false uplift due to this noise gets smaller and smaller. This is exactly why we ensure we don’t make conclusions before the test has gathered enough data.
The same goes when we use final conversion as our primary metric rather than ‘next action’. Sure, there is more noise, which is one of the reasons why it takes longer to reach statistical significance. But once you reach statistical significance, your results are just as valid, and are more aligned with your ultimate objective.
But where do you draw the line?
Back to our LinkedIn Sales navigator example: as discussed above, the primary metric should be free trial sign ups. But this isn’t actually the ultimate final conversion you want the user to take. The ultimate conversion you want the user to take is to become a full-time subscriber to your product, beyond the free trial.
You should think of it like a relay race.
The objective of the landing page is to generate free trials. → The objective of the free trial is to generate full time subscriptions. → The objective of the full time subscription is to maintain the customer (or even upsell other product options):
Each part of the relay race is responsible for getting the customer to the next touch point. The landing page has a lot of power to influence how many users end up starting the free trial. It has less power to influence how successful the free trial is and whether the user will continue beyond the trial.
Nonetheless, we’ve seen experiments whereby the change does have a positive impact beyond the first leg of the relay race, as it were. In one experiment we explained the product more clearly on the landing page. This increased the user’s understanding of it, making them more likely to actually use their free trial (and be successful in doing so). This lead to an uplift in full subscription purchases 30 days later.
For this kind of experiment that could have an ongoing influence, you may wish to keep the experiment running for longer to get a read on this. It is sensible to define a decision policy up-front in this instance. In this example, where the impact on full purchases is likely to be flat or positive, your decision policy might be:
If we see a flat result or a fall in free trial sign ups (primary KPI) we will do the following:
Stop the test and iterate with a new execution based on our learnings from the test.
If we see a significant uplift in free trial sign ups (primary KPI), we will do the following:
Serve the test to 95% and keep a 5% hold back to continue measuring the impact on full subscription purchases (secondary KPI).
This way, you will be able to make the right decisions and move on to your next experiments while still learning the full value of your experiment.
For a test where there is a higher risk of a negative impact on full subscription purchases, you may do the following things:
Define the full subscription metric as your guardrail metric.
Design a stricter decision policy whereby you gather enough data to confirm there is no negative impact on full subscription purchases.
But what if you are struggling to reach significance?
For many, using the next action as the primary metric allows them to experiment faster. So does low traffic justify testing to the next action instead of sale? Sometimes, but only if you’ve considered these options first:
1.Don’t run experiments
That’s not to say you shouldn’t be improving your website too. Experiments are the truest form of evidence to understand your audience. But if you don’t have enough traffic, the next best thing to inform & validate your optimisation is using other forms of evidence instead. You can use methods such as usability testing. Gathering insights via analytics data & user research is extremely powerful. This is something we continually do alongside experimentation, for all our clients.
2. Be more patient
For a particularly risky change, you might be willing to be patient and choose to run an experiment that will take longer to reach significance. Before you do this, ensure you plug in the numbers to a test duration calculator so that you have a good idea of exactly how patient you are going to need to be. Here’s a couple of good ones that are independent of any particular testing tool:
If you are trying to run tests to a very specific audience or a low traffic page, you aren’t going to have much luck in reaching statistical significance. Make sure you look at your site analytics data and prioritise your audiences and areas by their relative size.
With all being said, you do have a 4th option..
If you are really struggling to reach statistical significance then you might want to use the next action as your primary metric. This isn’t always a disaster – so long as you interpret your results correctly. The problem is that so often people don’t.
For a site with small traffic it may make sense to take this approach if you are experienced in interpreting experiment results.
However, for sites with lots of traffic, there’s really no excuse. So start making the switch today. Your win rates might fall slightly, but when you get a win, you can feel confident that you are making a true difference to the bottom line.
To find out more about our approach to experimentation, get in touch today!
Understanding your customers is critical to a successful optimisation strategy. Knowing what motivates some users to purchase, and what prevents others from checking out, is a fundamental requirement to strategic experimentation.
But not all customers are the same, some are impulsive and others are considerate!
This blog sets out to help you find the different audiences that browse your website so you can optimise for them accordingly.
What is a user segment?
A user segment is a distinct set of users that act differently when compared to other users. “Act differently” is important, as there is no point identifying audiences for your website but not being able to act on this information as they all perform the same. Segmenting your users into two groups and targeting them with different experiments only to find out they exhibit the same behaviour all of the time, needlessly increases the length of time it takes to run an experiment.
You also need to ensure that you can identify these user segments online for them to be useful for experimentation. Common personas include data points around income, personality or lifestyle which are useful for adding content and understanding the motivations of a user but I can’t target a personality trait like introvert, or segment my experiment results by this.
Finally, sample sizing is important for experimentation, and my user segments need to be large enough to support analysis to a confident level. You might spot a really interesting trend for users in Bristol that use the first version of Internet Explorer, but if that is only 0.1% of your traffic and will require 500 days to reach a sample size that increases revenue by £50, is it worth your time?
Why do they matter?
By identifying the different types of users that browse your website you understand the different motivations behind conversions and behaviours that these users exhibit. This can then help you enhance the user experience and remove the barriers to conversion for each audience.
For example, if you know that you have a large segment of browsing users that cycle between listing and product pages over and over again without ever purchasing, you might come to the conclusion that there is key information missing on the listing page. This is then an actionable insight that you can use to gather more information through experimentation.
Experimentation will help you to understand the key information your visitors require, in order to commit to a transaction. These learnings impact not only sales, but can increase the efficiency of your marketing efforts as you know which information to include and bring to your customers attention.
Additional insight can be unlocked from existing experiments when breaking down results by previously identified segments too. At its simplest level, splitting results by device can give insights into how user journeys differ from mobile to desktop and give you data on how to improve device specific experiences. When results differ consistently this can also be a clear indication that your on-site experimentation strategy is ready for personalisation.
Inversely, when you’re not seeing differing results through your user segments this can be a sign that there are still gains to be made from traditional A/B testing to a large audience. A common error for marketers to make is to ‘over-personalise’ customer experiences without any data to show that their user base is ready for a customised experience. This usually results in a higher frequency of inconclusive experiments, and the winning tests having a smaller revenue impact than they would if the benefit was served to the entire audience.
How do I find them?
1.What would you do?
The starting point for building user segments should be to think about your own personal experiences when browsing your website and that of your competitors. It’s likely that those experiences aren’t unique to you and are common amongst your user base. At the first level, think about when you visit the website, what devices you use and what your state of mind tends to be.
It’s difficult to template this approach as your user segments should be unique to your website or industry. Every website has new and returning users, but the characteristics of these user types can differ wildly across websites within the same industry. A new user to google maps can have wildly different intentions to a new user visiting citymapper despite there being an argument that the core products are very similar.
2.Do they exist in the data?
The next step is within analytics, where we can check for our 3 audience criteria: identifiable, impactful and showing distinct behaviour. Anything that is identifiable in analytics should be identifiable on the website (unless you are merging 3rd party data after user sessions – but this should be an edge use case).
The most common way to find user segments within analytics is to see how user journeys differ by different visitor properties – i.e. is there a significant difference between how users transition through the website when they come via branded search terms compared to unbranded? How does their journey differ depending on where they are in the customer lifecycle?
It’s important not to overcomplicate your segments to make them look groundbreaking – most aren’t! It may be as simple as new visitors and those on mobile having similar characteristics compared to returning desktop users. Providing you’ve found distinct behaviour and the audiences are large enough to have a real impact on your KPIs, you can begin to align your strategy towards their needs.
3.Give them some life.
Once you’ve found your segments, I find it useful to name them and give them a relevant story. This can help tailor your thinking towards what the customer needs are and align your strategy to their goal. The best experimentation programmes tend to be customer-centric – so your user segments should be too.
Those new visitors and mobile users seen in the graphs above may have a large proportion of traffic but low conversion rates – and looking at the customer lifecycle shows that almost ¾ of revenue is generated after the first session anyway. It’s reasonable to assume that these visitors are researching at this stage, whilst returning users on desktop are much more likely to convert. Labelling these two segments as “researchers” and “buyers” can stop you wasting time trying to make new users convert when they aren’t likely to; instead you can find out what information is important to these and enhance their user experience so they are more likely to return and convert at a later stage.
There you have it! A couple of actionable user segments that bring to life the different ways visitors browse your website.
From this you can stop bombarding researchers with intimidating urgency tactics that frustrate this type of user and instead look to provide them with the core information they need. When they come back, and they will if they’ve had a positive first impression, you know they’re significantly more likely to purchase. That is when conversion tactics can help give the user the nudge they need to get the conversion over the line and turn what may have been another abandoned basket into a loyal customer.
If you’d like to find out more about how you begin to build meaningful user segments for your business, get in touch today!
Everything we produce is the result of our choices. Which products and features do we roll out? Which do we roll back? And which ideas never even make it on the backlog?
The problem is – most of us suck at making choices.
Decisions are made by consensus, based on opinion not evidence. We’re riddled with subjectivity and bias, often masquerading as “experience”, “best practice” or “gut instinct”.
But there’s a better way – using experimentation as a way to define your product roadmap.
Experimentation as a product development framework
For many product organisations, experimentation serves two functions:
1.Safety check: Product and engineering run A/B tests (or feature flags) to measure the impact of new features.
2.Conversion optimisation: Marketing and growth run A/B tests often, for example, on the sign-up flow to optimise acquisition.
But this neglects experimentation’s most important function:
3.Product strategy: Product teams use experimentation to find out which features and ideas their customers will actually use and enjoy.
In doing so, you can use experimentation to inform product – not just validate it. You can test bolder ideas safely, creating better products for your customers. By putting experimentation at the heart of their business, organisations like Facebook, Amazon, Uber and Spotify have created and developed products used by billions worldwide.
But they’re in the minority. They represent the 1% of brands that have adopted experimentation as not just a safety check, but as a driving force for their product.
So how do the 99% of us better adopt experimentation?
Five principles of product experimentation
#1 Experiment to solve your biggest problems.
First, and most importantly, you should experiment on your biggest problems – not your smallest.
If experimentation is only used to “finesse the detail” by A/B testing minor changes, you’re wasting the opportunity.
To start, map out the products or features you’re planning. What are the assumptions you’re making, and what are the risks you’re taking? How can you validate these assumptions with experimentation?
Also, what are the risks you’re not taking – but would love to at least try with an A/B test?
#2 Be bold.
Experimentation lets you experiment with the confidence of a safety net.
Because experiments are – by their nature – measurable and reversible, it gives us a huge opportunity to test ideas that are bolder than we’d ever dare.
Type 1 decisions are irreversible – “one-way doors”:
“These decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before.”
Type 2 decisions are reversible – “two-way doors”:
“But most decisions aren’t like [Type 1 decisions] – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through.”
As a company grows, everything needs to scale, including the size of your failed experiments. If the size of your failures isn’t growing, you’re not going to be inventing at a size that can actually move the needle. Amazon will be experimenting at the right scale for a company of our size if we occasionally have multibillion-dollar failures.
If we aren’t prepared to risk failure, then we don’t innovate. Instead, we stagnate and become Blockbuster in the Netflix era.
Instead, experimentation gives us a safety net to take risks. We can test our boldest concepts and ideas, which would otherwise be blocked or watered down by committee. After all, it’s only a test…
#3 Test early / test often.
Experimentation works best when you test early and often.
But for most product teams, they’re testing once, at the end. They do this to measure the impact of a new feature before or just after it launches. (This is the “safety check” concept, mentioned above.)
Their process normally looks like this:
Whether the experiment wins or loses – whether the impact is positive or negative – the feature is typically rolled out anyway.
Why? Because of the emotional and financial investment in it. If you’ve spent 6 or 12 months building something and then find out it doesn’t work, what do you do?
You could revert back and write off the last 6 months’ investment. Or you could persevere and try to fix it as you go.
Most companies choose the second option – they invest time and money in making their product worse.
As Carson Forter, ex-Twitch now Future Research, says of bigger feature releases:
“By the time something this big has been built, the launch is very, very unlikely to be permanently rolled back no matter what the metrics say.”
That’s why we should validate early concepts as well as ready-to-launch products. We start testing as early as possible – before we commit to the full investment – to get data on what works and what doesn’t.
After all, it’s easier to turn off a failed experiment than it is to write off a failed product launch. What’s more, gathering data from experiments will help us guide the direction of the product.
#4 Start small and scale.
To do that – to test early and often – it means you’ll frequently have to start with the “minimum viable experiment” (MVE).
Just like a minimum viable product, we’re looking to test a concept that as simple and as impactful as possible.
So what does this look like in practice? Often “painted door tests” work well here. You don’t build the full product or feature and test that. After all, by that point, you’ve already committed to the majority of the investment. Instead, you create the illusion of the product or feature.
Suppose a retailer wanted to test a subscription product. They could build the full functionality and promotional material and then find out if it works. Or they could add a subscription option to their product details pages, and see if people select it.
Ideally before they run the experiment, they’d plan what they’d do next based on the uptake. So if fewer than 5% of customers click that option, they may deprioritise it. If 10% choose it, they might add it to the backlog. And if 20% or more go for it, then it may become their #1 priority till it was shipped.
We’ve helped our clients apply this to every aspect of their business. Should a food delivery company have Uber-style surge pricing? Should they allow tipping? What product should they launch next?
#5 Measure what matters.
The measurement of the experiment is obviously crucial. If you can’t measure the behaviour that you’re looking to drive, there’s probably little point in running the experiment.
So it’s essential to define both:
the primary metric or “overall evaluation criterion” – essentially, the metric that shows whether the experiment wins or loses, and
any second or “guardrail metrics” – metrics you’re not necessarily trying to affect, but don’t want to perform any worse.
You’d set these with any experiment – whether you’re optimising a user journey or creating a new product.
As far as possible – and as far as sample size/statistical significance allows – focus these metrics on commercial measures that affect business performance. So “engagement” may be acceptable when testing a MVE (like the fake subscription radio button above), but in future iterations you should build out the next step in the flow to ensure that the positive response is maintained throughout the funnel.
Why is this approach better?
1.You build products with the strongest form of evidence – not opinion. Casey Winters talks about the dichotomy between product visionaries and product leaders. A visionary relies more on opinion and self-belief, while a leader helps everyone to understand the vision, then builds the process and uses data to validate and iterate.
And the validation we get from experiments is stronger than any other form of evidence. Unlike traditional forms of product research – focus groups, customer interviews, etc – experimentation is both faster and more aligned with future customer behaviour.
The pyramid below shows the “hierarchy of evidence” – with the strongest forms of evidence at the top, and the weakest at the bottom.
You can see that randomised controlled trials (experiments or A/B tests) are second only to meta analyses of multiple experiments in terms of quality of evidence and minimal risk of bias:
2.Low investment – financially and emotionally. When we constantly test and iterate, we limit the financial and emotional fallout. Because we test early, we’ll quickly see if our product or feature resonates with users. If it does, we iterate and expand. If it doesn’t, we can modify the experiment or change direction. Either way, we’re limiting our exposure.
This applies emotionally as well as financially. There’s less attachment to a minimum viable experiment than there is a fully-built product. It’s easier to kill it and move on.
And because we’re reducing the financial investment, it means that…
3.You can test more ideas. In a standard product development process, you have to choose the products or features to launch, without strong data to rely on. (Instead, you may have market research and focus groups, which are beneficial but don’t always translate to sales).
In doing so, you narrow down your product roadmap unnecessarily – and you gamble everything on the product you launch.
But with experimentation, you can test all those initial ideas (and others that were maybe too risky to be included). Then you can iterate and develop the concept to a point where you’re launching with confidence.
It’s like cheating at product development – we can see what happens before we have to make our choice.
4.Test high risk ideas in a low risk way. Because of the safety net that experimentation gives us (we can just turn off the test), it means we can make our concepts 10x bolder.
We don’t have to water down our products to reach a consensus with every stakeholder. Instead, we can test radical ideas – and just see what happens.
Like Bill Murray in Groundhog Day, we get to try again and again to see what works and what doesn’t. So we don’t have to play it safe with our ideas – we can test whatever we want.
Don’t forget, if we challenge the status quo – if we test the concepts that others won’t – then we get a competitive advantage. Not by copying our competitors, but by innovating with our products.
And this approach is, of course, hugely empowering for teams…
5. Experiment with autonomy. Once you’ve set the KPIs for experimentation – ideally the North Star Metric that directs the product – then your team can experiment with autonomy.
There’s less need for continual approval, because the opinion you need is not from your colleagues and seniors within the business, but from your customers.
And this is a hugely liberating concept. Teams are free to experiment to create the best experience for their customers, rather than approval from their line manager.
6. Faster. Experimentation doesn’t just give you data you can’t get anywhere else, it’s almost always faster too.
Suppose Domino’s Pizza want to launch a new pizza. A typical approach to R&D might mean they commission a study in consumer trends and behaviour, then use this to shortlist potential products, then run focus groups and taste tests, then build the supply chain and roll out the new product to their franchisees, and then…
Well, then – 12+ months after starting this process – they see whether customers choose to buy the new pizza. And if they don’t…
But with experimentation, that can all change. Instead of the 12+ month process above, Domino’s can run a “painted door” experiment on the menu. Instead of completing the full product development, then can add potential pizzas to the menu that look just like any other product on the menu. Then, they measure the add-to-basket rate for each.
This experiment-led approach might take just a couple of weeks (and a fraction of the cost) of traditional product development. What’s more, the data gathered is, as above, likely to correlate more closely to future sales.
7. Better for customers. When people first hear about the painted door testing like this example with Domino’s, they worry about the impact on the customer.
“Isn’t that a bad customer experience – showing them a product they can’t order?”
And that’s fair – it’s obviously not a good experience for the customer. But the potential alternative is that you invest 12 months’ work in building a product nobody wants.
It’s far better to mildly frustrate a small sample of users in an experiment, than it is to launch products that people don’t love.
To find out more about our approach to product experimentation, please get in touch with Conversion.
Iterating on experiments is often reactive and conducted as an afterthought. A lot of time is spent producing a ‘perfect’ test and if results are unexpected, iterations are run as a last hope to gain value from the time and effort spent on the test. But why subjectively try and execute the perfect experiment in the first instance and postpone the opportunity to uncover learnings along the way by running a minimum viable experiment which is then iterated on?
Experimentation is run at varying levels of maturity (see our Maturity Modelfor more information on this) however we see businesses time and time again getting stuck in the infant stages due to their focus on individual experiments. We see teams wasting time and resource trying to run one ‘perfect’ experiment when the core concept has not been validated.
In order to validate levers quickly without over investing in resource we should ensure hypotheses are executed in their most simple form – the minimum viable experiment (MVE). From here, success of an MVE gives you the green light to test more complex implementations and failure flags problems with the concept/execution early on.
A few years ago, we learnt the importance of this approach the hard way. Based off the back of one hypothesis for an online real estate business, ‘Adding the ability to see properties on a map will help users find the right property and increase enquiries’, we built a complete map view in Optimizely. A heavy amount of resource was used only to find out within the experiment that the map had no impact on user behaviour. What should we have done? Ran an MVE requiring the minimum resource in order to test the concept. What would this have looked like? Perhaps a fake door test in order to test the demand of the map functionality from users.
This blog aims to give:
An understanding of the minimum viable approach to experimentation
A view of potential challenges and tips to overcome them
A clear overview of the benefits of MVEs
The minimum viable approach
A minimum viable experiment looks for the simplest way to run an experiment that validates the concept. This type of testing isn’t about designing ‘small tests’, it is about doing specific, focused experiments that give you the clearest signal of whether or not the hypothesis is valid. Of course, it helps that MVEs are often small so we can test quickly! It is important to challenge yourself by assessing every component of the test and its likelihood of impacting the way the user responds to an experiment. That way, you will be efficient with your resource and yield the same effect on proving the validity of the concept. Running the minimum viable experiment allows you to validate your hypothesis without over investing in levers that turn out to be ineffective.
If the MVE wins, then iterations can be ran to find the optimal execution – gaining learnings along the way. If the test loses, you can look at the execution more thoroughly and determine whether bad execution impacted the test. If so, re-run the MVE. If not, bin the hypothesis to avoid wasting resource on unfruitful concepts.
All hypotheses can be reduced to an MVE, see below a visual example of an MVE testing stream.
Potential challenges to MVEs and tips to overcome them
Although this approach is the most effective, it is not often fully understood, resulting in pushback from stakeholders. Stakeholders are invested in the website and moreover protective of their product. As a result, the expectation from experimentation is that a perfect execution of a problem will be tested which could be implemented immediately should the test win. However, what is not considered is the huge amount of resource this would require without any validity that the hypothesis was correct or that the style of execution was optimal.
In order to overcome this challenge we focus on working with experimentation, marketing and product teams in order to challenge assumptions around MVEs. This education piece is pivotal for stakeholder buy-in. Over the last 9 months, we have been running experimentation workshops with one of the largest online takeaway businesses in Europe and a huge focus of these sessions has been on the minimum viable experiment.
Overview of the benefits of MVEs
Minimum viable experiments have a multitude of benefits. Here, we aim to summarise a few of these:
The minimum viable experiment of a concept allows you to utilise the minimum amount of resource required to see if a concept is worth pursuing further or not.
Validity of the hypothesis is clear
Executing experiments in their most simple form ensures the impact of the changes are evident. As a result, concluding the validity of the experiment is uncomplicated.
Explore bigger solutions to achieve the best possible outcome
Once the MVE has been proven, this justifies investing further resource in exploring bigger solutions. Iterating on experiments allows you to refine solutions to achieve the best possible execution of the hypothesis.
A minimum viable experiment involves testing a hypothesis in its simplest form, allowing you to validate concepts early on and optimise the execution via iterations.
Push back on MVEs are usually due to a lack of awareness of the process and benefits they yield. Educate in order to show teams how effective this type of testing is, not only in gaining the best possible final execution for tests but also in utilising resource with efficiency.
The main benefit of the minimum viable approach is that you spend time and resource on levers that impact your KPIs.
With experimentation and conversion optimisation, there is never a shortage of ideas to test.
In other industries, specialist knowledge is often a prerequisite. It’s hard to have an opinion on electrical engineering or pharmaceutical research without prior knowledge.
But with experimentation everyone can have an opinion: marketing, product, engineering, customer service – even our customers themselves. They can all suggest ideas to improve the website’s performance.
The challenge is how you prioritise the right experiments.
There’s a finite number of experiments that we can run – we’re limited both by the resource to create and analyse experiments, and also the traffic to run experiments on.
Prioritisation is the method to maximise impact with an efficient use of resources.
Prioritisation is the method to maximise impact with an efficient use of resources.
Where most prioritisation frameworks fall down
There are multiple prioritisation frameworks – PIE (from WiderFunnel), PXL (from ConversionXL), and more recently the native functionality within Optimizely’s Program Management.
Each framework has a broadly consistent approach: prioritisation is based on a combination of (a) the value of the experiment, and (b) the ease of execution.
potential (how much improvement can be made on the pages?)
importance (how valuable is the traffic to the page?) and
ease (how complicated will the test be to implement?)
This is effective: it ensures that you consider both the potential uplift from the experiment alongside the importance of the page. (A high impact experiment on a low value page should rightfully be deprioritised.)
But it can be challenging to score these factors objectively – especially when considering an experiment’s potential.
Conversion XL’s PXL framework looks to address this. Rather than asking you to rate an experiment out of 10, it asks a series of yes/no questions to objectively assess its value and ease.
Experiments that are above the fold and based on quantitative and qualitative research will rightly score higher than a subtle experiment based on gut instinct alone.
This approach works well: it rewards the right behaviour (and can even help drive the right behaviour in the future, as users submit concepts that are more likely to score well).
But while it improves the objectivity in scoring, it lacks two fundamental elements:
It accounts for page traffic, but not page value. So an above-the-fold research-backed experiment on a zero-value page could be prioritised above experiments that could have a much higher impact. (We used to work with a university in the US whose highest-traffic page was a blog post on ramen noodle recipes. It generated zero leads – but the PXL framework wouldn’t account for that automatically.)
While it values qualitative and quantitative research, it doesn’t appear to include data from the previous experiments in its prioritisation. We know that qualitative research can sometimes be misleading (customers may say one thing and do something completely different). That’s why we validate our research with experimentation. But in this model, its focus is purely on research – whereas a conclusive experiment is the best indicator of a future iteration’s success.
Moreover, most frameworks struggle to adapt as an experimentation programme develops. They tend to work in isolation at the start – prioritising a long backlog of concepts – but over time, real life gets in the way.
Competing business goals, fire-fighting and resource challenges mean that the prioritisation becomes out-of-date – and you’re left with a backlog of experiments that is more static than a dynamic experimentation programme demands.
Introducing SCORE – Conversion.com’s prioritisation process
Our approach to prioritisation is based on more than 10 years’ experience running experimentation programmes for clients big and small.
We wanted to create an approach that:
Prioritises the right experiments: So you can deliver impact (and insight) rapidly.
Adapts based on insight + results: The more experiments you run, the stronger your prioritisation becomes.
Removes subjectivity: As far as possible, data should be driving prioritisation – not opinion.
Allows for the practicalities of running an experimentation programme: It adapts to the reality of working in a business where the wider priorities, goals and resources change.
But the downside is that it’s not a simple checklist model. In our experience, there’s no easy answer to prioritisation – it takes work. But it’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.
It’s better to spend a little more time on prioritisation than waste a lot more effort building the wrong experiments.
With that in mind, we’re presenting SCORE – Conversion.com’s prioritisation process:
As you’ll see, the prioritisation of one concept against each other happens in the middle of the process (“Order”) and is contingent on the programme’s strategy.
Strategy: Prioritising your experimentation framework
At Conversion.com, our experimentation framework is fundamental to our approach. Before we start on concepts, we first define the goal, KPIs, audiences, areas and levers (the factors that we believe affect user behaviour).
When your framework is complete (or, at least, started – it’s never really complete), we can prioritise at the macro level – before we even think about experiments.
Assuming we’ve defined and narrowed down the goal and KPIs, we then need to prioritise the audiences, areas and levers:
Prioritise your audiences on volume, value and potential:
Volume – the monthly unique visitors of this audience. (That’s why it’s helpful to define identifiable audiences like “prospects”, “users on a free trial”, “new customers”, and so on.)
Value – the revenue or profit per user. (Continuing the above example, new customers are of course worth more than prospects – but at a far lower volume.)
Potential – the likelihood that you’ll be able to modify their behaviour. On a retail website, for example, there may be less potential to impact returning customers than potential customers – it may be harder to increase their motivation and ability to convert relative to a user who is new to the website.
You can, of course, change the criteria here to adapt the framework to better suit your requirements. But as a starting point, we suggest combining the profit per user and the potential improvement.
Don’t forget, we want to prioritise the biggest value audiences first – so that typically means targeting as many users as possible, rather than segmenting or personalising too soon.
In much the same way as audiences, we can prioritise the areas – the key content that the user interacts with.
For example, identify the key pages on the website (homepage, listings page, product page, etc) and score them on:
Volume – the monthly unique visitors for the area.
Value – the revenue or profit from the area.
Potential – the likelihood that you’ll be able to improve the area’s performance. (Now’s a good time to use your quantitative and qualitative research to inform this scoring.)
(It might sound like we’re falling into the trap of other prioritisation models: asking you to estimate potential, which can be subjective. But, in our experience, people are more likely to score an area objectively, rather than an experiment that they created and are passionate about.)
Also, this approach doesn’t need to be limited to your website. You can apply it to any other touchpoint in the user journey too – including offline. Your cart abandonment email, customer calls and Facebook ads can (and should) be used in this framework.
As above, levers are defined as the key factors or themes that you think affect an audience’s motivation or ability to convert on a specific area.
These might be themes like pricing, trust, delivery, returns, form usability, and so on. (Take another look at the experimentation framework to see why it’s important to separate the lever from the execution.)
When you’re starting to experiment, it’s hard to prioritise your levers – you won’t know what will work and what won’t.
That’s why you can prioritise them on either:
Confidence – a simple score to reflect the quantitative and qualitative research that supports the lever. If every research method shows trust as a major concern for your users, it should score higher than another lever that only appears occasionally.
Win rate – If you have run experiments on this lever in the past, what was their win rate? It’s normally a good indicator of future success.
Of course, if you’re starting experimentation, you won’t have a win rate to rely on (so estimating the confidence is a fantastic start).
But if you’ve got a good history of experimentation – and you’ve run the experiments correctly, and focused them on a single lever – then you should use this data to inform your prioritisation here.
Again, the more we experiment, the more accurate this gets – so don’t obsess over every detail. (After all, it’s possible that a valid lever may have a low win rate simply because of a couple of experiments with poor creative.)
Putting this all together, you can now start to prioritise the audiences, areas and levers that should be focused on:
As you can see, we haven’t even started to think about concepts and execution – but we have a strong foundation for our prioritisation.
Concepts: Getting the right ideas
After defining the strategy, you can now run structured ideation around the KPIs, audiences, areas and levers that you’ve defined.
This creates the ideal structure for ideation.
Rather than starting with, “What do we want to test?” or “How can we improve product pages?”, we’re instead focusing on the core hypotheses that we want to validate:
How can we improve the perception of pricing on product pages for new customers?
How can we overcome concerns around delivery in the basket for all users?
And so on.
This structured ideation around a single hypothesis generates far better ideas – and means you’re less susceptible to the tendency to throw everything into a single experiment (and not knowing which part caused the positive/negative result afterwards).
Order: Prioritising the concepts
When prioritising the concepts – especially when a lever hasn’t been validated by prior experiments – you should look to start with the minimum viable experiment (MVE).
Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis. (Can we test a hypothesis with 5 hours of development time rather than 50?)
Just like a minimum viable product, we want to define the simplest experiment that allows us to validate the hypothesis.
This is a hugely important concept – and one that’s easily overlooked. It’s natural that we want to create the “best” iteration for the content we’re working on – but that can limit the success of our experimentation programme. It’s far better to run ten MVEs across multiple levers that take 5 hours each to build, rather than one monster experiment that takes 50 hours to build. We’ll learn 10x as much, and drive significantly higher value.
So at the end of this phase, we should have defined the MVE for each of the high priority levers that we’re going to start with.
Roadmap: Creating an effective roadmap
There are many factors that can affect your experimentation roadmap – factors that stop you from starting at the top of your prioritised list and working your way down:
You may have limited resource, meaning that the bigger experiments have to wait till later.
There may be upcoming page changes or product promotions that will affect the experiment.
Other teams may be running experiments too, which you’ll need to plan around.
And there are dozens more: resource, product changes, marketing, seasonality can all block experiments – but shouldn’t block experimentation altogether.
That’s why planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of external factors.
Planning your roadmap is as important as prioritising the experiments. Planning delivers the largest impact (and insight) in spite of internal factors.
To plan effectively:
Identify your swimlanes: These are the audiences and areas from your framework that you’ll be experimenting on. (Again, make sure you focus on the high priority audiences and areas – don’t be tempted to segment or personalise too early.)
Estimate experiment duration:Use an appropriate minimum detectable effect for the audience and area to calculate the duration, then block out this time in the roadmap.
Experiment across multiple levers: Gather more insight (and spread your risk) by experimenting across multiple levers. If you focus heavily on a lever like “trust” with your first six experiments, you might have to start again if the first two or three experiments aren’t successful.
Experimentation: Running and analysing the experiments
With each experiment, you’ll learn more about your users: what changes their behaviour and what doesn’t.
You can scale successful concepts and challenge unsuccessful concepts.
For successful experiments, you can iterate by:
Moving incrementally from minimum viable experiments to more impactful creative. (With one Conversion.com client, we started with a simple experiment that promoted the speed of delivery. After multiple successful experiments around delivery, we eventually worked with the client to test the commercial viability of same-day delivery.)
Applying the same lever to other areas and potentially audiences. If amplifying trust messaging on the basket page works well, it’ll probably work well on listing and product pages too.
Meanwhile, an experiment may be unsuccessful because:
The lever was invalidated – Qualitative research may have said customers care about the lever, but in practice makes no difference.
The execution was poor – It happens sometimes. Every audience/area/lever combination can have thousands of possible executions – you won’t get it right first time, every time, and you risk rejecting a valid lever because of a lousy experiment.
There an external factor – It’s also possible that other factors affected the test: there was a bug, the underlying page code changed, a promotion or stock availability affected performance. It doesn’t happen often, but it needs to be checked.
In experiment post-mortems, it’s crucial to investigate which of these is most likely, so we don’t reject a lever because of poor execution or external factors.
Conduct experiment post-mortems so you don’t reject a lever because of poor execution or external factors.
What’s good (and bad) about this approach
This approach works for Conversion.com – we’ve validated it on clients big and small for more than ten years, and have improved it significantly along the way.
It’s good because:
It’s a structured and effective prioritisation strategy.
It doesn’t just reward data and insight – it actively adapts and improves over time.
It works in the real-world, allowing for the practicalities of running an experimentation programme.
On the flip side, its weaknesses are that:
It takes time to do properly. (You should create and prioritise your framework first.)
You can’t feed in 100 concepts and expect it to spit out a nicely ordered list. (But in our experience, you probably don’t want to.)
Sometimes it’s easy to forget the art of persuasion wasn’t invented in the 90s with the onset of dial-up internet. All it takes is a quick break away from the screen, and it’s not long before you realise how much inspiration can be drawn from the ‘real world’ – having evolved and stood the test of time over centuries.
Recently I had the pleasure of experiencing an unlikely source of such real-world inspirations in the shape of Istanbul’s magnificent market, the Grand Bazaar. Beyond the chaos and sensory overload that hits you at first sight, the bazaar is one of the toughest places to compete for attention. Home to 4,000 shops spread across 61 covered streets, each street specialises in a particular type of item, so a shop ends up selling almost identical merchandise to their 20+ neighbours. With local tradesmen having to sharpen and evolve their selling techniques since the 15th century, the Grand Bazaar is a treasure chest of time-tested lessons we can learn from.
In this article, we expose some of the most interesting time-proven persuasion techniques used by local tradesmen, explain why they work, and show how you can apply them effectively on your website.
Break the ice with a simple question
The similarity of products being sold side-by-side means a nice product presentation often isn’t enough – tradesmen must find additional ways of bringing in customers. Seconds into entering the market, salesmen approach me with their favourite ice breaker: “Where are you from?” It’s hard to think of an easier question to answer, yet it serves an important purpose.
Simply by answering “Russia”, I’ve committed myself into a conversation where the shop owner now has the chance to find common ground (showing off his language skills and inevitably admitting his respect for Putin). His chances of luring in an experienced shopper like myself are now considerably higher.
How you could use this:
Get users to make that initial small commitment. A small question or action that doesn’t require any thinking can get that initial engagement from your users without putting them off longer steps. It can also be a great opportunity for users to self-select and allow you to tailor content around their answer.
Similarly, instead of presenting your visitors with an off-putting long form, starting with a simple first question can get that important first engagement that will make users less likely to run off as they face the more frictional questions.
Offer something first
As I am eventually lured into a carpet store, I am politely offered a glass of Turkish tea. Tea is a huge deal in Turkish culture and is an ever-present part in the art of selling by bazaar salesmen.
The act of accepting a glass of tea triggers an important psychological principle in itself, creating a sense of reciprocity in the buyer. As such, when we were given a little something for free, we became indebted to the other party and are naturally more obliged to offer something back in exchange. In this context, that could mean making a concession in negotiation, feeling more obliged to buy or at the very least giving your time for the shop owner to sell their story. In addition, the time spent drinking tea and listening to the seller creates a second level reciprocity – time indebtedness.
How you could apply this online:
Offer something meaningful to your visitors – without them needing to give something in return (be it money or information). Depending on your business this can be a free piece of content that is of value to them (think ebooks or webinars), or a sample of your product.
Besides this, reciprocity can have more creative methods. SurveyGizmo (below) pro-actively offers its users a free trial extension without asking for anything in return. They know that this small act of kindness will make you much more likely to then start paying after the trial extension.
Sell the story
As I sip the delectable tea, the seller has the perfect stage to sell their story to a now-receptive audience. I am told about the history of Turkish rugs, the people behind the work, the unique processes and great that goes into making them and the resulting quality that will make rugs last for generations (ever increasing in value over time).
In a market so crowded with identical items and with prices so fluid, the importance of differentiating your product is absolutely crucial both to getting the sale and one at a higher price. Many tourists (including myself) never came to the Bazaar with the intent of buying a rug, and experience will tell the seller that a strong story highlighting authenticity, antiquity and quality will often prove enough to create the necessary desire.
How you could apply this online:
Boost your perceived value by really selling the story. You may not be fighting for attention in a bazaar, but it is more than likely that there are plenty of similar products or services available as alternatives. Get the insights from your customers about why they chose your product and emphasise this at key consideration stages to make it crystal clear why exactly you stand out from your competitors.
Anchor the price
Once the storytelling is over, the owner’s assistants begin to bring out the rugs. The first rugs I am shown are as intricate and beautiful as they come, however with the price tag to match. I’m quoted a price far higher than I would ever spend on a rug – and as I tell him this I am confident the shop owner already knows this. He is applying the price anchoring technique: the next set of rugs brought out for me are still pricey, but sit in a considerably more realistic price range – one that subconsciously I’m now more likely to be content with.
The deliberate act of showing me the highest price items first sets a psychological benchmark against which I am comparing subsequent prices. Now that I’ve been shown the expensive rugs, the next ones come across as relatively good value.
How you could apply this online:
The price anchoring technique is particularly powerful in an environment like a bazaar where the less experienced customer typically doesn’t have a great knowledge of prices, however anchoring can be readily applied online.
One of the most natural places anchoring can have a dramatic impact is in setting pricing strategies. If you present the lower priced option first, they will be anchored towards the lower end, and so the reverse applies when you show the expensive plan first. Another classic is adding an extra-premium option to a 2-option plan that typically leads more people to select the seemingly better value now-middle option.
Give positive reinforcement
I take time to look through the mid-range rugs laid out in front of me, with the salesman briefly fading into the background. As I get my hands on a particular rug, he steps back in saying “Great taste! This rug is 100% wool and is of great finish.” He explains how you can see the quality of the finishing is particularly intricate and shows how the rug changes colour depending on viewing angle. Call me gullible, but his words encouraged me and made me more open to negotiation.
How you could apply this to online:
Applied at the right moment, positive reinforcement can prove an efficient nudge to encourage a user to follow through to the next desired action and convert.
One of the most natural settings in which you could apply it, is at the point items are being added to basket. This could manifest itself as the virtual version of the Turkish salesman (praising the user’s selection) or through the use of positive messaging when a particular goal is achieved, such as reaching a free delivery threshold, completing an offer or unlocking a discount. Equally, reassuring users of the value of the product they have selected (price/most savings , most popular) can be a powerful motivator at the crucial final stages of the funnel.
On forms, a virtual pat on the back on completion of a particular step can “humanise” the experience and go a long way in encouraging the user to continue the momentum through to the end.
Address common objections
As we narrow down to a particular rug and we begin to discuss the price, the shop owner feels the sale edge closer and starts explaining about the free wrapping, hassle-free & reliable shipping, how the rug is easy to maintain and how it’s going to last for a lifetime.
Through experience, rug sellers naturally pick up on the common objections tourists thrust at them (often in a final attempt to pull out from buying a rug they never intended to buy) and use this knowledge to proactively address each of these concerns. This may seem obvious, but for some reason it often fails to translate itself into the online world. And unlike the physical world, there is often no-one to answer those concerns on demand.
How you could apply this online:
Find out the common objections your visitors are raising that are stopping them from converting, and make sure these are made perfectly clear and visible at the stages they have those concerns. This should be a fundamental principle of any conversion optimisation programme, yet is still often neglected.
Do this simply by asking your visitors and customers directly – something that’s so easy nowadays with all the tools out there, yet for some reason still overlooked by many. Use on-site surveys to ask those that are abandoning the leakiest parts of your funnel why, or you could ask those that did convert if there was something that made them hesitate. If you have customer support – speak to them or listen in to their conversations. Use the insights you get to address their concerns exactly in the places you know they crop up.
It’s important to remind ourselves that while the medium may be different, the human nature is still very much the same at its core. Whether strolling through a bazaar or browsing the depths of Google, the same principles of persuasion influence our decision to convert – even if the execution will vary.
I may have returned from Istanbul with a rug I neither really needed nor intended to buy, but besides serving as a lush-but-out-of-place living room centrepiece it also serves as a reminder that we should be inspired by and learning from those who have been relying on and perfecting the art of persuasion for centuries.
So, next time you are out shopping in the ‘real world’, take note of your positive experiences and observations and think about how you can transform these onto your website to help boost your own conversions.
Have you ever heard of Mazagran? A coffee-flavoured bottled soda that Starbucks and Pepsi launched back in the mid-1990s? No, you haven’t, and there is a good reason for that!
Starbucks correctly collected market research that told them customers wanted a cold, sweet, bottled coffee beverage that they could conveniently purchase in stores.
So surely Mazagran was the answer?
Evidently not! Mazagran was not what the consumers actually wanted. The failure of this product was down to the asymmetry that existed between what the customers wanted and what Starbucks believed the customer wanted.
Despite Starbucks conducting market research, this gap in communication still occurred, often known as the perception gap. Luckily for Starbucks, Mazagran was a stepping stone to the huge success that came with bottled Frappucinos; what the consumers actually wanted.
What is the perception gap and why does it occur?
Perception is seen as the (active) process of assessing information in your surroundings. A perception gap occurs when you attempt to communicate this assessment of information but it is misunderstood by your audience.
Assessing information in your surroundings is strongly influenced by communication. Due to different forms of human communication, a perception gap can occur when communication styles are different to your own. Not only can these gaps occur, but they vary in size. This depends on the different levels of value that you, or your customers, attach to each factor. In addition, many natural cognitive biases can influence the degree of the perception gap, biasing ourselves to believe we know what other people are thinking, more than we actually do.
Perception gaps in ecommerce businesses
Perception gaps mainly occur in social situations, but they can also heavily impact e-commerce businesses, from branding and product to marketing and online experience.
Perception gaps within ecommerce mainly appear due to customers forming opinions about your company and products on their broader experiences and beliefs. One thing that is for sure, perception gaps certainly occur between websites and their online users. Unfortunately, they are often the start of vicious cycles, where small misinterpretations of what the customer wants or needs are made worse when we try to fix them. Ultimately, this means we are losing out on turning visitors into customers.
Starbucks and Pepsi launching Mazagran was an example of how perception gaps can lead to the failure of new products. McDonalds launching their “Good to Know” campaign is an example of how understanding this perception gap can lead to branding success.
This myth-busting campaign was launched off the back of comprehensive market research using multiple techniques. McDonalds understood the differences between what they thought of themselves e.g. fast food made with high quality ingredients, and what potential customers thought of McDonalds, e.g. chicken nuggets made of chicken beaks and feet. Understanding that this perception gap existed allowed them to address these in their campaign, which has successfully changed users perceptions of their brand.
For most digital practices, research plays an important part in allowing a company or brand to understand their customer base. However, conducting and analysing research is often where the perception gap begins to form.
For example, say you are optimising a checkout flow for a retailer. You decide to run an on-site survey to gather some insight into why users may not be completing the forms, and therefore are not purchasing. After analysing the results it seems the top reason users are not converting is they are finding the web form confusing. Now this where the perception gap is likely to form. Do users want the form to be shortened? Do they want more clarity or explanation around form fields? Is it the delivery options that they may not understand?
Not being the user means we will never fully understand the situation that the user is in. Making assumptions of this builds on the perception gap.
Therefore, reducing the perception gap is surely a no-brainer when it comes to optimising our websites. But is it as easy as it seems?
In order to reduce the perception gap you need to truly understand your customer base. If you don’t, then there is always going to be an asymmetry between what you know about your customers and what you think you know about your customers.
How to reduce perception gaps
Sadly, perception gaps are always going to exist due to our interpretation of the insights we collect and the fact that we ourselves are not the actual user. However, the following tips may help to get the most out of your testing and optimisation by reducing the perception gap:
Challenge assumptions – too often we assume we know about our customer, how they are interacting with our site and what they are thinking. Unfortunately, these assumptions can get cemented over time into deeply held beliefs of how users think and behave. However, challenging these assumptions leads to true innovation and new ideas that may not have been thought of before. With this in mind, assumptions can be answered by the research we conduct.
Always optimise based on two supporting evidences – the perception gap is more likely to occur when research into a focus area is limited or based on one source of insight. Taking a multiple-measure approach means insights are likely to be more valid and reliable.
Read between the lines – research revolves around listening to your customers but more importantly it is about reading between the lines. It is the difference between asking for their responses and then actually understanding them. As Steve Jobs once said “Customers don’t know what they want”; whether you believe that or not, understanding their preferences is still vital for closing the perception gap.
Shift focus to being customer-led – being more customer-led, as opposed to product-led will place a higher value on research of your customers. With more emphasis on research, this should lead to a great knowledge and understanding of your customer base, which in turn should reduce the perception gap that has the potential to form.
The perception gap is something that is always going to exist and is something we have to accept. Conducting research, and a lot of it, is certainly a great way to reduce the perception gap that will naturally occur. However, experimentation is really the only means to truly confirm whether the research and insight you collected into your customer base are valid and significantly improve the user experience. One quote that has always made me think is by Flint McLaughlin who said “we don’t optimise web pages, we optimise for the sequence of thought”. This customer-led view when it comes to experimentation can only result in success.
One of the core principles of experimentation is that we measure the value of experimentation in impact and insight. We don’t expect to get winning tests all the time, but if we test well, then we should always expect to draw insights from them. The only real ‘failed test’, is a test that doesn’t win and we learn nothing from.
In our eagerness to start testing, it’s common that we come up with an idea (hopefully at least based on data with an accompanying hypothesis!), get it designed and built and set it live. Most of the thought goes into the design and execution of the idea, and often less thought goes into how to measure the test to ensure we get the insight we need.
By the end of this article you should have:
A strong knowledge of why tracking multiple goals is important
A framework to structure your goals, so you know what’s relevant for each test
In every experiment it’s important to define a primary goal upfront – the goal that will ultimately judge the test a win/loss. It’s rarely enough to just track this one goal though. The problem is that if the test wins, great, but we may not understand fully why. Similarly if the test loses and we only track the main goal, then the only insight we are left with is that it didn’t win. In this case, we don’t just have a losing test, we also have a test where we lose the ability to learn – the second key measure of how we get value from testing. And remember, most tests lose!
If we don’t track other goals and interactions in the test we will miss the behavioural nuances and the other micro-interactions that can give us valuable insight as to how the test affected user behaviour. This is particularly important in tests where a positive result on the main KPI could actually harm another key business metric.
One example from a test we ran recently was for a camera vendor. We introduced add to basket CTAs on a product listing page, so that users who knew which product they wanted wouldn’t have to navigate down to the product page to purchase.
This led to a positive uplift on orders however, it had a negative effect on average order value. The reason for this was that the product page was an important place where users could also discover accessories for their products, including product care packages. As the test was encouraging users to add the main product, they were then less inclined to buy accessories and add-ons. The margins for accessories and add-on products are far higher than cameras, so a lower average order value driven by fewer accessories is definitely a negative outcome.
Insights from well tracked tests should be a key part of how your testing strategy develops as new learnings inform better iterations and open up new areas to testing by revealing user behaviour that you were previously unaware of.
In any test, there can be an almost endless number of things you could measure and the solution to not tracking enough shouldn’t be to track everything. Measure too much and you’ll potentially be swamped analysing data points that don’t have any value and you’ll curry no favour with your developers who have to implement all the tracking! Measure too little and you may miss valuable insights that could turn a losing test into a winning test. The challenge is to measure the right things for each test.
What to measure?
Your North Star Metric
Every test should be aligned to the strategic goal of testing, which goes without saying. That strategic goal should always have a clear measurable goal. For an ecommerce site it will likely be orders, or revenue. Leads for a lead gen site/page. Number of pages or page scroll for a content site – so on and so forth. This KPI will be the key measurement of whether your test succeeds or fails and for that reason, we call it the North Star metric. In essence, regardless of whatever else happens in the test, if we can’t move the needle of this metric, the test doesn’t win. Unsurprisingly, this metric should be tracked in every test you run.
You’ll know if the test wins, but what other effects did it have on your site? What effect did it have on purchase behaviour and revenue? Did it lead to a decrease in some other metrics which might be important to the business?
You should also be defining ‘guardrail metrics’. These tend to be second tier metrics that relate to key business metrics, which if they perform negatively could call into question the interpretation of how successful the test is. If the test loses but these perform well, it’s also probably a good sign you’re on the right track. They don’t, on their own, define the success or failure like the North Star metric, but they contextualise the North Star metric when reporting on the test.
For an ecommerce site, if we assume the North Star metric is orders, then two obvious guardrail metrics would be revenue and order value. If we run a test that increases orders, but as a result, users buy less items, or lower value items as in the example above, this would decrease AOV and could harm revenue.
Tests can become much more insightful just by adding two more metrics. Not only can we see the test drove more orders, but we can also see that our execution had an effect on the value and quantity of products being bought. This gives us the opportunity to either change the execution of the test to address the negative impact on our guardrail metrics. In this sense, measuring tests effectively is a core part of an iterative test and learn approach.
At a minimum, you should be tracking your North Star metrics and guardrail metrics. These will tell you the impact of the test on the bottom line for the business.
Some tests you run may only impact your North Star metric – a test on the payment step of a funnel is a good example where the most likely outcome will either mean more orders or less orders, and not much else. What you’ll learn is whether that change pushed users over the line.
Most other tests, however, will have a number of different effects. Your test may radically change the way users interact with the page and measuring your tests at a deeper level than just the North Star and guardrail metrics will help you understand what effect the change has on user behaviour.
We work with an online food delivery company where meal deals are the main way customers browse and shop. Given the amount of the meal deals they have, one issue we found through our initial insights was that users struggle to navigate through them all to find something relevant. We ran a test where we introduced filtering options to the meal deal page, which included how many people the deal feeds, what types of food the deal contains, saving amounts and the price points. Along with they key metrics, we also tracked all the filter options in the test.
This test didn’t drive any additional orders, in fact not many users interacted with the filter suggesting it wasn’t very useful in helping users curate the meal deals. However, what we did notice was that users that did use it by far chose to filter meal deals by price and secondly by how many people they feed. So a ‘flat’ test, but now we know two very important pieces of information that users look for when selecting deals.
This in turn led to a series of tests around how we better highlight price and how many people the meal feeds at different parts of the user journey and on the meal deal offers themselves. These insights have helped shape the direction of our testing strategy by shedding light on user preferences. If we had only tracked the North Star and guardrail metrics, these insights would have been lost.
For each test you run, really think through what the possible user journeys and interactions could be as a result of the test and make sure you track these. It doesn’t mean track everything, but start to see tests as a way of learning about your users not just a way to drive growth.
If you’ve managed to track your North Star, guardrail and some secondary metrics in your tests, you’re in a great place. One other thing you’ll want to think about is how to segment your data. Segmenting your test results will be hugely important, especially when you get different user groups that respond differently on your site. Device is an obvious segment that you should be looking with all your test. We’ve seen tests that have had double digit uplifts on desktop, but haven’t moved the needle at all on mobile.
If your test involves introducing a new feature or piece of functionality that users can interact with, it’s helpful to create a segment for users that interact with that feature. This will help shed light over how interaction with this new functionality affects the user behaviour.
Successful tests are measured by impact and insight. The only ‘failed’ test is one that doesn’t win and you don’t learn anything. Insightful tests allow you to better understand why a test performs the way it did and mean that you can learn, iterate and improve more rapidly, leading to better more effective testing.
Define your North Star metric – The performance of this metric will define if the test succeeds or fails. This should be directly linked to the key goal of the test.
Use guardrail metrics – Ensure your test isn’t having any adverse effects on other important business metrics.
Track smaller micro-interactions – These don’t decide the fate of your test but they do generate deeper insight into user-behaviour that can inform future iterations.
Segment by key user groups – Squeeze even more insight from your tests by looking at how different groups of users react to your changes.