Why your experimentation programme needs a risk profile

Risk can be a competitive advantage in your experimentation programme.

If you’re taking more calculated risks than your competitors, you’re going to get better results.

But to do that, you need to understand your risk profile.

In this post, we’ll look at how to define risk in your experimentation programme – as well as three techniques to create better high-risk experiments.

Doing nothing is still a risk

Facebook wall decor

“In a world that is changing really quickly, the only strategy that is guaranteed to fail is not taking risks.”

Mark Zuckerberg

Facebook have always been open to taking risks.

When he was 22, Mark Zuckerberg turned down billion-dollar offers for the company. Instead, over the next ten years, he spent billions of dollars himself: acquiring companies like Instagram and WhatsApp, and making other long-forgetting investments and product launches that didn’t pay off.

The priority is to manage risk – and experimentation can be an effective strategy to do so.

Now some stakeholders will think that any experiment – no matter how small – is a huge risk. But for most A/B tests, your risk is limited. There’s the cost of building the test and occasionally a potential drop in performance during the test. But normally that’s it. 

In fact, there’s actually much more risk in making changes to the website without testing them.

Low-risk and high-risk experiments

Your experimentation programme already has a risk profile.

Every experiment you run is low risk, high risk, or somewhere in between. And if you’re not consciously managing that balance, you’re probably not getting the full benefit of experimentation.

Experiments can be low risk. You might have run similar tests in the past, and be pretty confident that this one will work. (Or at least confident it won’t break anything.)

Low risk: If showing “most popular” sleeve lengths was successful (left), then expanding the sizing options out (right) is likely to be low risk.

They can be medium risk. You might be trying out a completely new untested hypothesis. It could work – or you might have wasted time building the test, and lost money running it.

Medium risk: What would happen if we added a donation target to Unicef’s landing page, and showed how close we were to achieving it?

Or the experiments can be high risk. This is when you test disruptive ideas. Experiments like this are high risk because you’re risking the cost of building it and the potential loss of money while the experiment is live.

But it goes further – there’s also a risk outside of the experiment. It might affect the audience in the experiment long after you stopped it – or it might have implications on the brand as a whole.

High risk: What would happen if we changed Wistia’s SaaS pricing model from feature-based to volume-based?

Analysing low- and high-risk experiments

To summarise:

Low-risk experiments are typically iterative – you’re building on an already-proven concept. Their role is to exploit: you’ve validated a lever and are now looking to maximise its impact across the customer journey. The only potential loss is the cost of building the experiment (just because it worked once, doesn’t mean it’ll work again).

Medium-risk experiments are typically innovative – you’re testing out new concepts (but not necessarily radical ones). Their role is to explore: you want to understand what drives customer behaviour, and an experiment will inform that understanding. As before, the potential loss is the cost of building the experiment – but you may also lose money running the experiment, if it lowers performance.

High-risk experiments are disruptive – not only are you testing out something new, there’s a chance that it could fail miserably. These are the concepts that your competitors are probably too nervous to test – but they could deliver you a significant competitive advantage if they work.

Their role is to expand – to widen your approach by testing radically different ideas. But the risk is greater too. There’s potential for non-controlled impact – essentially, where the damage doesn’t stop when the A/B test stops.

Take the screenshot above from Wistia’s pricing page. Testing a new pricing structure is a high-risk experiment: it could significantly increase revenue, or it could lower it. And potentially it could affect customers who aren’t in the experiment, it could be reported on social media or wider, and so on.

But often these high-risk experiments come with the highest reward. These are the ones that help you move beyond the local maxima.

High-risk experiments will help you jump from the local maxima to the global maxima.

Working out your experimentation risk profile

Look at the experiments you’ve run in the last 6 or 12 months, as well as your backlog of upcoming experiments. Then rate each as low, medium or high risk.

Of course, the definition of risk in your organisation will be different to mine. So come up with a simple format that works for you.

If you like, you can try a series of questions like this:

  • What type of change are you making? eg UI, functionality, pricing, product.
  • Have you tested a similar hypothesis before?
  • If you have, was it successful?
  • What’s the cost needed to build the experiment?
  • What percentage of online revenue does the experiment affect?
  • Might it change the behaviour of users in the experiment even after it’s stopped?
  • Might it change the behaviour of users not in the experiment?

We’ve put this in a simple spreadsheet. You can answer all the questions and get a risk score straight away. Of course, you’ll want to adapt the questions and variables and scoring before you start. This is just an example:

Download and adapt this Google Sheet

Or if you want an even simpler alternative, just ask yourself this question about each experiment:

“If I couldn’t run an A/B test, would I still make this change?”

If you’d still make the change, it’s almost certainly low or medium risk. If you wouldn’t, it’s probably high risk.

The importance of high-risk experiments

If we only test changes we’d make anyway, we’re wasting the opportunity of experimentation.

This is one of the most common mistakes people make in experimentation. They only run tests on changes that they’d make anyway.

It starts with an idea:

“This seems like a good idea. Let’s test it and see just how right I am.” 

Now there’s a good reason to test these changes. You might be wrong. Or some audience segments may respond differently. And if it is successful, it’s good to know the size of the impact – not just whether it’s positive or negative. This insight will help you come up with new hypotheses and prioritise your roadmap. 

But it’s just as important to test changes that make you nervous. Disruptive experiments allow you to make bigger bets.

“This experiment might crash and burn, but if it works…”

If you’re only testing best practice or patterns you see on competitor websites, you’re not going to be getting a competitive advantage. You’re going to be limiting yourself to the local maxima. 

Experimentation allows us to test anything we want – and to limit the fallout. It derisks innovation.

Creating your risk strategy

If you used the Google Sheet above, it’ll show you what your risk profile looks like visually:

In this example, you’ll see that most experiments are blue (medium risk), with an equal balance of low- and high-risk experiments.

There’s no perfect answer for what your risk profile should be. Ideally you’d have a balance of all three – and it should change over time.

So right now, your risk profile might look like this:

You’ve got an even balance of innovative and iterative experiments, with occasional radical experiments included to allow for greater leaps forward.

But if you’re in peak season, it might look like this:

You increase the iterative experiments to reduce the risk. Because iterative experiments have a higher win rate, you’re going to have a safer programme during peak season. That means you increase revenue without risking revenue at peak. And you might hold back on radical experiments altogether.

But if you’re just starting your experimentation programme then it might look like this:

You have an even balance across all three. You don’t invest too heavily in iterations, since you haven’t tested too much yet. And you balance innovative and radical ideas to get quick feedback as you develop your product and marketing strategy. (Of course, having this many disruptive experiments is dependent on having the right culture.)

How COVID-19 changes your risk profile

Right now is the best time to be thinking about your risk profile. COVID-19 has changed everything.

Some companies – food delivery, e-learning, home retail are seeing a surge in demand. They should adopt the peak risk profile above, unless they’re still relatively new to experimentation.

But other companies are seeing demand drop off a cliff. That means they could be more aggressive:

With demand dropping, doing nothing is the biggest risk of all.

Instead of doing nothing, or just iterating on the experiments that you’ve run previously, now’s the perfect time to try out the ideas that you were too nervous to do before. 

Coming up with disruptive ideas

By this stage, it’s probably clear that we need to be running more disruptive experiments. This will give us the most meaningful advantage over the competition. 

But how do we come up with the ideas? 

It can be challenging to think creatively. We have a tendency to think incrementally. To look at what we have already, and see how we can make it a little better. It’s hard to throw it out, start over and come up with something that may be better. 

So here are three exercises to try:


Invalidating lore

This is a good place to start. 

“Lore” means the anecdotal knowledge or opinions within a company, which have never been tested. The things you do because “that’s the way they’ve always been done”.

Chamath Palihapitiya started Facebook’s growth team. You might have seen him on the news recently saying that the US shouldn’t bail out hedge funds and billionaires during COVID-19.

He attributed the growth of Facebook to their constant focus on “invalidating lore”:

“One of the most important things that we did was just invalidate all of the lore… All we did was disprove all of the random anecdotal nonsense that filtered around the company.”

Chamath Palihapitiya

Remember, this was a company that was only a few years old. But that’s one of the reasons Facebook has been so successful – everything can be challenged, and data > opinion. 

That is perfect for us – we deal in data, so we should be able to challenge assumptions and show evidence.

Domino’s Pizza – never afraid to innovate

Take Domino’s Pizza. They’ve been making pizzas for 60 years – there’s a lot of knowledge and experience in the business, but that brings with it plenty of “lore”: a lot of assumptions and “that’s the way it’s always been done”. 

But luckily Domino’s is an innovative organisation.

So when companies like Deliveroo and Just Eat and Uber Eats started growing, Domino’s had to respond. These new competitors had deep pockets, and were growing aggressively. What’s more, they were charging customers for delivery – while with Domino’s, customers took free delivery for granted. 

This put Domino’s in an awkward position. They could carry on as before – offering free delivery, because that’s what they’d always done and that’s what customers expected.

Or they could test it.

So they chose one geographic market to test it in. Then they ran an A/B test that added surge pricing to visitors in the treatment. That way, they could see the effect of making this change. Would customers abandon their order? Would they be less likely to come back in the future? Or has it become accepted that you pay for food delivery now?

What’s important here isn’t the result of the experiment – it’s that you seek out the lore in your business, and test it. Forget what you’ve done before or what you thought you knew, and focus on what it could be. 

Divergent thinking

Next is divergent thinking. This is where you answer a question again and again and again… until you start coming up with weird and wonderful answers. Then you go back, and build on these ideas. 

brown concrete bricks
Name as many uses for a brick as you can.

You’ve probably heard of interview questions like, “Name as many uses for a brick as you can.” 

So you might start by talking about building a house or a wall. Then you might think about its attributes. Bricks are heavy, so you could use one as a paper weight or door-stop. Or you could use it to break a window, or take the wheels from a car, and so on. 

This is the approach that Airbnb founder Brian Chesky used to design their “11-star framework”.

Chesky and the other founders wanted to create the perfect experience. So they started to brainstorm the equivalent of a hotel’s 5-star experience, and extrapolated from there. 

Do listen to Chesky talking about this himself – it’s hard to do it justice second-hand. [Masters of Scale 10:36–13:23]

“You almost have to design the extreme to come backwards.”

Brian Chesky

Like Chesky says, you design the extreme and then come backwards. That way you create something that’s significantly further ahead of where you’re at today. 

You don’t take one step forward from where you are today. You go to where the ideal becomes impossible and then take one step back.

2x not 2%

We need to shift our mindset from incremental growth to exponential growth. It sounds obvious, but the following is a mistake I make all the time… 

You’re building an experiment backlog, and you start by looking at what’s already there, what it looks like, how it works…

But often, you anchor your ideas to what’s there already, making small changes rather than throwing things out and starting again. 

And that’s totally fine most of the time – especially if you’re deliberately working on low-risk experiments. 

But if you want to come up with radical ideas, you need to think differently. So ask yourself:

What would increase the conversion rate 2x (not 2%)?

Let’s take an example…

You’re optimising a SaaS website and you want to increase sign-ups. You could… 

  • Improve the homepage based on user feedback (iterative)
  • Optimise the form based on best practice (iterative)
  • Emphasise Google or Facebook sign up options (innovative)
  • Change from a single-page form to a chatbot style Q&A approach (innovative)

Or could you remove the registration step altogether? 

This is what Posterous did, ten years ago. They wanted to get more people to host content on their blogging platform.

But rather than forcing people to sign up, they just let people upload the content by sending an email. Then they’d create an account automatically:

This is one of the hardest things to do in experimentation – to look at a problem differently and not just test changes you’d make anyway.

Good luck!

The Big Debate: What should your primary metric be?

One of the biggest myths in testing is that your primary metric shouldn’t be the purchase or conversion at the end of the user journey.

In fact, one of the biggest names in the game, Optimizely, states:

Your primary metric (and the main metric in your hypothesis) should always be the behaviour closest to the change you are making in the variation you are employing.”

Optimizely

We disagree – and want to show how this approach can actually limit the effectiveness of your experimentation programme.

But first… what is a primary metric?

Your primary metric is the metric you will use to decide whether the experiment is a winner or not.

We also recommend tracking:

  • Secondary metrics – to gain more insight into your users’ behaviour 
  • Guardrail metrics – to ensure your test isn’t causing harm to other important business KPIs.

So what’s the big debate? 

Some argue that your primary metric should be the next action you want the user to take, not final conversion.

Diagram: Next action vs final action

For example, on a travel website selling holidays, the ‘final conversion’ is a holiday booking – this is the ultimate action you want the user to take. However, if you have a test on a landing page, the next action you want the user to take is to click forward into the booking funnel.

The main motive for using the next action as your primary metric is that it will be quicker to reach statistical significance. Moreover, it is less likely to give an inconclusive result. This is because:

  • Inevitably more users will click forward (as opposed to making a final booking) so you’ll have a higher baseline conversion rate, meaning a shorter experiment duration.
  • The test has a direct impact on click forward as it is the next action you are persuading the user to take. Meanwhile there may be multiple steps between the landing page and the final conversion. This means many other things could influence the user’s behaviour, creating a lot of noise.  
  • There could even be a time lag. For example, if a customer is looking for a holiday online, they are unlikely to book in their first session. Instead they may have a think about it and have a couple more sessions on the site before taking the final step and converting. 

Why is the myth wrong?

Because it can lead you to make the wrong decisions.

Example 1: The Trojan horse

Take this B2B landing page below: LinkedIn promotes their ‘Sales Navigator’ product with an appealing free trial. What’s not to like? You get to try out the product for free so it is bound to get a high click through rate.

But wait…when you click forward you get a nasty shock as the site asks you to enter your payment details. You can expect a high drop-off rate at this point in the funnel.

On this landing page LinkedIn doesn’t tell the user about the credit card form waiting two steps away
LinkedIn requires users to enter their payment details to access the free trial, but this was not made clear on the landing page

A good idea would be to test the impact of giving the user forewarning that payment details will be required. This is what Norton Security have under the “Try Now” CTA on their landing page.

Norton Security lets their users know that a credit card is required, so there are no nasty surprises

In an experiment like this, it is likely that you would see a fall in click through (the ‘next action’ from the landing page). However, you might well see an uplift in final conversion – because the user receives clear, honest, upfront communication.

In this LinkedIn Sales Navigator example:

  • If you were to use clicks forward as your primary metric, you would declare the test a loser, despite the fact that it increases conversion.
  • If you were to use free trial sign ups as your primary metric, you would declare the test a winner – a correct interpretation of the results.

Example 2: The irresistible big red button

The ‘big red button’ phenomenon in another scenario that will help to bust this troublesome myth:

When you see a big red button, all you want to do is push it – it’s human nature.

The big red button phenomenon

This concept is often taken advantage of by marketers:

Imagine you have a site selling experience gifts (e.g. ‘fine dining experience for two’ or ‘one day acrobatics course’). You decide to test the increasing prominence of the main CTA on the product page. You do this by increasing the CTA size and removing informational content (or moving it below the fold) to remove distractions. Users might be more inclined to click the CTA and arrive in the checkout funnel. However, this could damage conversion. Users may click forward but then find they are lacking information and are not ready to be in the funnel – so actual experience bookings may fall.

Again, in this scenario using click forward as your primary metric will lead you to the wrong conclusions. Using final conversion as your primary metric aligns with your objective and will lead you to the correct conclusions.

There are plenty more examples like these. And this isn’t a made-up situation or a rare case. We frequently see an inverse relationship between clickthrough and conversion in experimentation.

This is why PPC agencies and teams always report on final conversion, not just click through to the site. It is commonly known that a PPC advert has not done its job simply by getting lots of users to the site. If this was the case you would find your website inundated with unqualified traffic that bounces immediately. No – the PPC team is responsible for getting qualified traffic to your site, which they measure by final conversion rate.

But is it really a big deal?

Some people say, ‘Does it really matter? As long as you are measuring both the ‘next action’ and the final conversion then you can interpret the results depending on the context of the test.’

That’s true to some extent, but the problem is that practitioners often interpret results incorrectly. Time and time again we see tests being declared as winners when they’ve made no impact on the final conversion – or may have even damaged it.

Why would people do this? Well, there is a crude underlying motive for some practitioners. It makes them look more successful at their job – with higher win rates and quicker results.

And there are numerous knock on effects from this choice:

1.Wasting resources

When an individual declares a test as a winner incorrectly, the test will need to get coded into the website. This will be added to the development team’s vast pile of work. A huge waste of valuable resources when the change is not truly improving the user experience and may well be harming it.

2. Reducing learnings

Using next action as your primary metric often leads to incorrect interpretation of results. In turn, this leads to missing out vital information about the test’s true impact in communications. Miscommunication of results means businesses miss out on valuable insights about their users.

Always question your results to increase your understanding of your users. If you are seeing an uplift in the next action, ask yourself, ‘Does this really indicate an improvement for users? What else could it indicate?’ If you are not asking these questions, then you are testing for the sake of it rather than testing to improve and learn.

3. Sacrificing ROI

With misinterpreted results, you may sacrifice the opportunity to iterate and find a better solution that will work. Instead of implementing a fake winner, iterate, find a true winner and implement that!

Moreover, you may cut an experiment short, having seen a significant fall in next step conversion. Whereas if you had let the experiment run for longer, it could have given a significant uplift in final conversion. Declaring a test a loser when it is in fact a winner will of course sacrifice your ROI.

4. Harming stakeholder buy-in

On the surface, using click-through as your primary metric may look great when reporting on your program metrics. It will give your testing velocity and win rate a nice boost.  But it doesn’t take long, once someone looks beneath the surface, to see that all your “winners” are not actually impacting the bottom line. This can damage stakeholder buy-in, as your work is all assumptive rather than factual and data-driven.

But it’s so noisy!

A common complaint we hear from believers of the myth is that there is too much noise we can’t account for. For example, there might be 4 steps in the funnel between the test page and the final conversion. Therefore, there are so many other things that may have influenced the user in the time between step 1 and step 4 that could lead them to drop off.

That’s true. But the world is a noisy place. Does that mean we shouldn’t test at all? Of course not.

For instance, I might search “blue jacket” and Google links me through to an ASOS product page for their latest denim item. Between this page and the final conversion we have 3 steps: basket, sign in, checkout.

Look at all the noise that could sway my decision to purchase along each step of the journey:

As you can see there is a lot of unavoidable noise on the website and a lot of unavoidable noise external to the site. Imagine ASOS were to run a test on the product page and were only measuring the next action (“add to basket” clicks). Their users are still exposed to a lot of website noise and external noise during this first step.

However, one thing is for sure: all users will face this noise, regardless of whether they are in the control or the variant. As the test runs, the sample size will get larger and larger, and the likelihood of seeing a false uplift due to this noise gets smaller and smaller. This is exactly why we ensure we don’t make conclusions before the test has gathered enough data.

The same goes when we use final conversion as our primary metric rather than ‘next action’. Sure, there is more noise, which is one of the reasons why it takes longer to reach statistical significance. But once you reach statistical significance, your results are just as valid, and are more aligned with your ultimate objective.

But where do you draw the line?

Back to our LinkedIn Sales navigator example: as discussed above, the primary metric should be free trial sign ups. But this isn’t actually the ultimate final conversion you want the user to take. The ultimate conversion you want the user to take is to become a full-time subscriber to your product, beyond the free trial.

You should think of it like a relay race.

The objective of the landing page is to generate free trials. →  The objective of the free trial is to generate full time subscriptions. →  The objective of the full time subscription is to maintain the customer (or even upsell other product options):

Each part of the relay race is responsible for getting the customer to the next touch point. The landing page has a lot of power to influence how many users end up starting the free trial. It has less power to influence how successful the free trial is and whether the user will continue beyond the trial.

Nonetheless, we’ve seen experiments whereby the change does have a positive impact beyond the first leg of the relay race, as it were. In one experiment we explained the product more clearly on the landing page. This increased the user’s understanding of it, making them more likely to actually use their free trial (and be successful in doing so). This lead to an uplift in full subscription purchases 30 days later.

For this kind of experiment that could have an ongoing influence, you may wish to keep the experiment running for longer to get a read on this. It is sensible to define a decision policy up-front in this instance. In this example, where the impact on full purchases is likely to be flat or positive, your decision policy might be:

  • If we see a flat result or a fall in free trial sign ups (primary KPI) we will do the following:
    • Stop the test and iterate with a new execution based on our learnings from the test.
  • If we see a significant uplift in free trial sign ups (primary KPI), we will do the following:
    • Serve the test to 95% and keep a 5% hold back to continue measuring the impact on full subscription purchases (secondary KPI).

This way, you will be able to make the right decisions and move on to your next experiments while still learning the full value of your experiment.

For a test where there is a higher risk of a negative impact on full subscription purchases, you may do the following things:

  1. Define the full subscription metric as your guardrail metric.
  2. Design a stricter decision policy whereby you gather enough data to confirm there is no negative impact on full subscription purchases.

But what if you are struggling to reach significance?

For many, using the next action as the primary metric allows them to experiment faster. So does low traffic justify testing to the next action instead of sale? Sometimes, but only if you’ve considered these options first:

1.Don’t run experiments

That’s not to say you shouldn’t be improving your website too. Experiments are the truest form of evidence to understand your audience. But if you don’t have enough traffic, the next best thing to inform & validate your optimisation is using other forms of evidence instead. You can use methods such as usability testing. Gathering insights via analytics data & user research is extremely powerful. This is something we continually do alongside experimentation, for all our clients.

2. Be more patient

For a particularly risky change, you might be willing to be patient and choose to run an experiment that will take longer to reach significance. Before you do this, ensure you plug in the numbers to a test duration calculator so that you have a good idea of exactly how patient you are going to need to be. Here’s a couple of good ones that are independent of any particular testing tool:

3. Run tests on higher traffic areas & audiences

If you are trying to run tests to a very specific audience or a low traffic page, you aren’t going to have much luck in reaching statistical significance. Make sure you look at your site analytics data and prioritise your audiences and areas by their relative size.

With all being said, you do have a 4th option..

If you are really struggling to reach statistical significance then you might want to use the next action as your primary metric. This isn’t always a disaster – so long as you interpret your results correctly. The problem is that so often people don’t.

For a site with small traffic it may make sense to take this approach if you are experienced in interpreting experiment results.

However, for sites with lots of traffic, there’s really no excuse. So start making the switch today. Your win rates might fall slightly, but when you get a win, you can feel confident that you are making a true difference to the bottom line.

To find out more about our approach to experimentation, get in touch today!

The Perception Gap: Can we ever really know what users want?

Have you ever heard of Mazagran? A coffee-flavoured bottled soda that Starbucks and Pepsi launched back in the mid-1990s? No, you haven’t, and there is a good reason for that!

Starbucks correctly collected market research that told them customers wanted a cold, sweet, bottled coffee beverage that they could conveniently purchase in stores.

So surely Mazagran was the answer?

Evidently not! Mazagran was not what the consumers actually wanted. The failure of this product was down to the asymmetry that existed between what the customers wanted and what Starbucks believed the customer wanted.

Despite Starbucks conducting market research, this gap in communication still occurred, often known as the perception gap. Luckily for Starbucks, Mazagran was a stepping stone to the huge success that came with bottled Frappucinos; what the consumers actually wanted.

What is the perception gap and why does it occur?

Perception is seen as the (active) process of assessing information in your surroundings. A perception gap occurs when you attempt to communicate this assessment of information but it is misunderstood by your audience.

Assessing information in your surroundings is strongly influenced by communication. Due to different forms of human communication, a perception gap can occur when communication styles are different to your own. Not only can these gaps occur, but they vary in size. This depends on the different levels of value that you, or your customers, attach to each factor. In addition, many natural cognitive biases can influence the degree of the perception gap, biasing ourselves to believe we know what other people are thinking, more than we actually do.

Perception gaps in ecommerce businesses

Perception gaps mainly occur in social situations, but they can also heavily impact e-commerce businesses, from branding and product to marketing and online experience.

Perception gaps within ecommerce mainly appear due to customers forming opinions about your company and products on their broader experiences and beliefs. One thing that is for sure, perception gaps certainly occur between websites and their online users. Unfortunately, they are often the start of vicious cycles, where small misinterpretations of what the customer wants or needs are made worse when we try to fix them. Ultimately, this means we are losing out on turning visitors into customers.

Starbucks and Pepsi launching Mazagran was an example of how perception gaps can lead to the failure of new products. McDonalds launching their “Good to Know” campaign is an example of how understanding this perception gap can lead to branding success.

This myth-busting campaign was launched off the back of comprehensive market research using multiple techniques. McDonalds understood the differences between what they thought of themselves e.g. fast food made with high quality ingredients, and what potential customers thought of McDonalds, e.g. chicken nuggets made of chicken beaks and feet. Understanding that this perception gap existed allowed them to address these in their campaign, which has successfully changed users perceptions of their brand.

For most digital practices, research plays an important part in allowing a company or brand to understand their customer base. However, conducting and analysing research is often where the perception gap begins to form.

For example, say you are optimising a checkout flow for a retailer. You decide to run an on-site survey to gather some insight into why users may not be completing the forms, and therefore are not purchasing. After analysing the results it seems the top reason users are not converting is they are finding the web form confusing. Now this where the perception gap is likely to form. Do users want the form to be shortened? Do they want more clarity or explanation around form fields? Is it the delivery options that they may not understand? 

Not being the user means we will never fully understand the situation that the user is in. Making assumptions of this builds on the perception gap.

Therefore, reducing the perception gap is surely a no-brainer when it comes to optimising our websites. But is it as easy as it seems? 

In order to reduce the perception gap you need to truly understand your customer base. If you don’t, then there is always going to be an asymmetry between what you know about your customers and what you think you know about your customers.

How to reduce perception gaps

Sadly, perception gaps are always going to exist due to our interpretation of the insights we collect and the fact that we ourselves are not the actual user. However, the following tips may help to get the most out of your testing and optimisation by reducing the perception gap:

  1. Challenge assumptions – too often we assume we know about our customer, how they are interacting with our site and what they are thinking. Unfortunately, these assumptions can get cemented over time into deeply held beliefs of how users think and behave. However, challenging these assumptions leads to true innovation and new ideas that may not have been thought of before. With this in mind, assumptions can be answered by the research we conduct.
  2. Always optimise based on two supporting evidences – the perception gap is more likely to occur when research into a focus area is limited or based on one source of insight. Taking a multiple-measure approach means insights are likely to be more valid and reliable.
  3. Read between the lines – research revolves around listening to your customers but more importantly it is about reading between the lines. It is the difference between asking for their responses and then actually understanding them. As Steve Jobs once said “Customers don’t know what they want”; whether you believe that or not, understanding their preferences is still vital for closing the perception gap.
  4. Shift focus to being customer-led – being more customer-led, as opposed to product-led will place a higher value on research of your customers. With more emphasis on research, this should lead to a great knowledge and understanding of your customer base, which in turn should reduce the perception gap that has the potential to form.

Conclusion

The perception gap is something that is always going to exist and is something we have to accept. Conducting research, and a lot of it, is certainly a great way to reduce the perception gap that will naturally occur. However, experimentation is really the only means to truly confirm whether the research and insight you collected into your customer base are valid and significantly improve the user experience. One quote that has always made me think is by Flint McLaughlin who said “we don’t optimise web pages, we optimise for the sequence of thought”. This customer-led view when it comes to experimentation can only result in success.

How to measure A/B tests for maximum impact and insight

One of the core principles of experimentation is that we measure the value of experimentation in impact and insight. We don’t expect to get winning tests all the time, but if we test well, then we should always expect to draw insights from them. The only real ‘failed test’, is a test that doesn’t win and we learn nothing from.

In our eagerness to start testing, it’s common that we come up with an idea (hopefully at least based on data with an accompanying hypothesis!), get it designed and built and set it live. Most of the thought goes into the design and execution of the idea, and often less thought goes into how to measure the test to ensure we get the insight we need.

By the end of this article you should have:

  • A strong knowledge of why tracking multiple goals is important
  • A framework to structure your goals, so you know what’s relevant for each test

In every experiment it’s important to define a primary goal upfront – the goal that will ultimately judge the test a win/loss. It’s rarely enough to just track this one goal though. The problem is that if the test wins, great, but we may not understand fully why. Similarly if the test loses and we only track the main goal, then the only insight we are left with is that it didn’t win. In this case, we don’t just have a losing test, we also have a test where we lose the ability to learn – the second key measure of how we get value from testing. And remember, most tests lose!

If we don’t track other goals and interactions in the test we will miss the behavioural nuances and the other micro-interactions that can give us valuable insight as to how the test affected user behaviour. This is particularly important in tests where a positive result on the main KPI could actually harm another key business metric.

One example from a test we ran recently was for a camera vendor. We introduced add to basket CTAs on a product listing page, so that users who knew which product they wanted wouldn’t have to navigate down to the product page to purchase.

This led to a positive uplift on orders however, it had a negative effect on average order value. The reason for this was that the product page was an important place where users could also discover accessories for their products, including product care packages. As the test was encouraging users to add the main product, they were then less inclined to buy accessories and add-ons. The margins for accessories and add-on products are far higher than cameras, so a lower average order value driven by fewer accessories is definitely a negative outcome.

Insights from well tracked tests should be a key part of how your testing strategy develops as new learnings inform better iterations and open up new areas to testing by revealing user behaviour that you were previously unaware of.

In any test, there can be an almost endless number of things you could measure and the solution to not tracking enough shouldn’t be to track everything. Measure too much and you’ll potentially be swamped analysing data points that don’t have any value and you’ll curry no favour with your developers who have to implement all the tracking! Measure too little and you may miss valuable insights that could turn a losing test into a winning test. The challenge is to measure the right things for each test.

What to measure?

Your North Star Metric

Every test should be aligned to the strategic goal of testing, which goes without saying. That strategic goal should always have a clear measurable goal. For an ecommerce site it will likely be orders, or revenue. Leads for a lead gen site/page. Number of pages or page scroll for a content site – so on and so forth. This KPI will be the key measurement of whether your test succeeds or fails and for that reason, we call it the North Star metric. In essence, regardless of whatever else happens in the test, if we can’t move the needle of this metric, the test doesn’t win. Unsurprisingly, this metric should be tracked in every test you run.

You’ll know if the test wins, but what other effects did it have on your site? What effect did it have on purchase behaviour and revenue? Did it lead to a decrease in some other metrics which might be important to the business?

The performance of the North Star metric determines whether or not your hypothesis is proven or disproven. Your hypothesis in turn should be directly related to your primary objective.
The performance of the North Star metric determines whether or not your hypothesis is proven or disproven. Your hypothesis in turn should be directly related to your primary objective.

Guardrail Metrics

You should also be defining ‘guardrail metrics’. These tend to be second tier metrics that relate to key business metrics, which if they perform negatively could call into question the interpretation of how successful the test is. If the test loses but these perform well, it’s also probably a good sign you’re on the right track. They don’t, on their own, define the success or failure like the North Star metric, but they contextualise the North Star metric when reporting on the test.

For an ecommerce site, if we assume the North Star metric is orders, then two obvious guardrail metrics would be revenue and order value. If we run a test that increases orders, but as a result, users buy less items, or lower value items as in the example above, this would decrease AOV and could harm revenue.

Tests can become much more insightful just by adding two more metrics. Not only can we see the test drove more orders, but we can also see that our execution had an effect on the value and quantity of products being bought. This gives us the opportunity to either change the execution of the test to address the negative impact on our guardrail metrics. In this sense, measuring tests effectively is a core part of an iterative test and learn approach.

At a minimum, you should be tracking your North Star metrics and guardrail metrics. These will tell you the impact of the test on the bottom line for the business.

Your guardrail metrics will generally be closely related to your North Star metric.
Your guardrail metrics will generally be closely related to your North Star metric.

Secondary Metrics

Some tests you run may only impact your North Star metric – a test on the payment step of a funnel is a good example where the most likely outcome will either mean more orders or less orders, and not much else. What you’ll learn is whether that change pushed users over the line.

Most other tests, however, will have a number of different effects. Your test may radically change the way users interact with the page and measuring your tests at a deeper level than just the North Star and guardrail metrics will help you understand what effect the change has on user behaviour.

We work with an online food delivery company where meal deals are the main way customers browse and shop. Given the amount of the meal deals they have, one issue we found through our initial insights was that users struggle to navigate through them all to find something relevant. We ran a test where we introduced filtering options to the meal deal page, which included how many people the deal feeds, what types of food the deal contains, saving amounts and the price points. Along with they key metrics, we also tracked all the filter options in the test.

This test didn’t drive any additional orders, in fact not many users interacted with the filter suggesting it wasn’t very useful in helping users curate the meal deals. However, what we did notice was that users that did use it by far chose to filter meal deals by price and secondly by how many people they feed. So a ‘flat’ test, but now we know two very important pieces of information that users look for when selecting deals.

This in turn led to a series of tests around how we better highlight price and how many people the meal feeds at different parts of the user journey and on the meal deal offers themselves. These insights have helped shape the direction of our testing strategy by shedding light on user preferences. If we had only tracked the North Star and guardrail metrics, these insights would have been lost.

For each test you run, really think through what the possible user journeys and interactions could be as a result of the test and make sure you track these. It doesn’t mean track everything, but start to see tests as a way of learning about your users not just a way to drive growth.

Secondary metrics help contextualise your North Star and Guardrail metrics, as well as shed light on other behaviours.
Secondary metrics help contextualise your North Star and Guardrail metrics, as well as shed light on other behaviours.

Segmentation

If you’ve managed to track your North Star, guardrail and some secondary metrics in your tests, you’re in a great place. One other thing you’ll want to think about is how to segment your data. Segmenting your test results will be hugely important, especially when you get different user groups that respond differently on your site. Device is an obvious segment that you should be looking with all your test. We’ve seen tests that have had double digit uplifts on desktop, but haven’t moved the needle at all on mobile.

If your test involves introducing a new feature or piece of functionality that users can interact with, it’s helpful to create a segment for users that interact with that feature. This will help shed light over how interaction with this new functionality affects the user behaviour.

Key takeaways

Successful tests are measured by impact and insight. The only ‘failed’ test is one that doesn’t win and you don’t learn anything. Insightful tests allow you to better understand why a test performs the way it did and mean that you can learn, iterate and improve more rapidly, leading to better more effective testing.

  • Define your North Star metric – The performance of this metric will define if the test succeeds or fails. This should be directly linked to the key goal of the test.
  • Use guardrail metrics – Ensure your test isn’t having any adverse effects on other important business metrics.
  • Track smaller micro-interactions – These don’t decide the fate of your test but they do generate deeper insight into user-behaviour that can inform future iterations.
  • Segment by key user groups – Squeeze even more insight from your tests by looking at how different groups of users react to your changes.

If you would like to learn more about our approach, get in touch today!

5 steps to kick-start your experimentation programme with actionable insights

Experimentation has to be data-driven.

So why are businesses still kicking off their experimentation programmes without good data? We all know running experiments on gut-feel and instinct is only going to get you so far.

One problem is the ever-growing number of research methods and user-research tools out there. Prioritising what research to conduct is difficult. Especially when you are trying to maximise success with your initial experiments and need to get those experiments out the door quickly to show ROI.

We are no stranger to this problem. And the solution, as ever, is to take a more strategic approach to how we generate our insight. We start every project with what we call the strategic insights phase. This is a structured, repeatable approach to planning user-research we’ve developed that consistently generates the most actionable insight whilst minimising effort.

This article will provide a step-by-step guide of how we plan our research strategy so that you can replicate something similar yourself. Meaning you can set up your future experiments for greater success.

The start of an experimentation programme is crucial. Pressures of getting stakeholders buy-in or achieving quick ROI means the initial experiments are often the most important. A solid foundation of actionable insight from user-research can make a big difference as to how successful your early experiments are.

With hundreds of research tools enabling multiple different research methods, a challenge arises with how we choose which research method will generate the insight that’s most impactful and actionable. Formulating a research strategy for how you’re going to generate your insight is therefore crucial.

When onboarding new clients, we run an intense research phase for the first month. This allows us to get up to speed on the client’s business and customers. More importantly, it provides us with data that allows us to start building our experimentation framework – identifying where our experimentation can make the most impact and what our experimentation should focus on. We find dedicating this time to insights sets our future experiments up for the bigger wins and therefore, a rapid return on investment.

Our approach: Question-led insights

When conducting research to generate insight, we use what we call a question-led approach. Any piece of research we conduct must have the goal of answering a specific question. We identify the questions we need to answer about a client’s business and their website and then conduct only the research we need to answer them. Taking this approach allows us to be efficient, gaining impactful and actionable insights that can drive our experimentation programme.

Following a question-led approach also means we don’t fall into the common pitfalls of user-research:

  • Conducting research for the sake of it
  • Wasting time down rabbit holes within our data or analytics
  • Not getting the actionable insight you need to inform experimentation

There are 5 steps in our question-led approach.

1. Identify what questions you need, or want, to answer about your business, customers or website

The majority of businesses still have questions about their customers they don’t have the answers to. Listing these questions can provide a brain-dump for everything you don’t know but that if you did know would help you design better experiments. Typically these questions will fall into three main categories; your business, your customers and your website.

Although one size does not fit all with the questions we need to answer, we have provided some of the typical questions that we need to answer for clients in e-commerce or SaaS.

SaaS questions:

  • What is the current trial-to-purchase conversion rate?
  • What motivates users on the trial to make a purchase? What prevents users on the trial to make a purchase?
  • What is the distribution between the different plans on offer?
  • What emails are they sending users when they are in their trial? What is the life cycle of these emails?
  • What are the most common questions asked to customer services or via live chat?

We can quite typically end up with a list of 20-30 questions. So the next step is to prioritise what we need to answer first.

2. Prioritise what questions need answering first

We want our initial experiments to be as data-driven and successful as possible. Therefore, we need to tackle the questions that are likely to bring about the most impactful and actionable insights first.

For example, a question like “What elements in the navigation are users interacting with the most?” might be a ‘nice to know’. However, if we don’t expect a navigation experiment to be one we would run any time soon, this may not be a ‘need to know’ and therefore wouldn’t be high priority. On the other hand, a question like “What’s stopping users from adding products to the basket?” is almost certainly a ‘need to know’. Answering this is very likely to generate insight that can be directly turned into an experiment. Rule of thumb is to prioritise the ‘need to know’ questions ahead of the ‘nice to know’.

We also need to get the actionable insight quickly. Therefore, it is important to ensure that we prioritise questions that aren’t too difficult or time consuming to answer. So, a second ranking of ‘ease’ can also help to prioritise our list.

3. Decide the most efficient research techniques to answer these questions

There are many types of research you could use to answer your questions. Typically we find the majority of questions can be answered by one or more of web analytics, on-site or email surveys, usability testing or heatmaps/scrollmaps. There may be more than one way to find your answer.

However, one research method could also answer multiple questions. For example, one round of usability testing might be able to answer multiple questions focused on why a user could be dropping off at various stages of your website. This piece of research would therefore be more impactful, as you are answering multiple questions, and would be more time efficient compared to conducting multiple different types of research.

For each question in our now prioritised list we decide the research method most likely to answer it. If there are multiple options you could rank these by the most likely to get an answer in the shortest time. In some cases we may feel the question was not sufficiently answered by the first research method, so it can be helpful to consider what you would do next in these cases.

4. Plan the pieces of research you will carry out to cover the most questions

You should no have a list of prioritised questions you want to answer and what research method you would use to answer each. From this you can select the pieces of research you should carry out based on which would give you the best coverage of the most important questions. For example, you might see that 5 of your top 10 questions could be answered through usability testing. Therefore, you should prioritise usability testing in the time you have, and the questions you need to answer can help you to design your set of tasks.

After your first round of research, revisit your list of questions and for each question evaluate whether or not you feel it has been sufficiently answered. Your research may also have generated more questions that should be added to the list. Periodically you might also need to re-answer questions where user behaviour has changed due to your experimentation. For example, if initially users were abandoning on your basket page due to a lack of trust, but successful experiments have fixed this, then you may need to re-ask the question to discover new problems on the basket page.

On a regular basis you can then repeat this process again of prioritising the questions, deciding the best research methods and then planning your next set of research.

5. Feed these insights into your experimentation strategy

Once your initial research pieces have been conducted and analysed it is important to compile the insight from them in one place. This has two benefits. The first being the ease in visualising and discovering themes that may be emerging within your data from multiple sources of insight. The second being the benefit that comes from having one source of information that could be shared with others within your business.

As your experimentation programme matures it is likely you will be continuously running research in parallel to your experiments. The insight from this research will answer new questions that will naturally arise and can help inform your experimentation.

Taking this question-led approach means you can be efficient with the time you spend on research, while still maximising your impact. Following our step-by-step guide will provide a solid foundation that you can work upon within your business:

  1. Identify what questions you need, or want, to answer about your business, customers or website
  2. Prioritise what questions need answering first
  3. Decide the most efficient research techniques to answer these questions
  4. Plan the pieces of research you will carry out to cover the most questions
  5. Feed these insights into your experimentation strategy

For more information on how to kick-start experimentation within your business, get in touch here.

Who, When and Where, but what about the Why? Understanding the value of Qualitative Insights: Competitor Analysis

Numbers, rates and statistics are great for finding out what’s happening on your site and where opportunities for testing lie, but quantitative insights can only take us so far. This series covers the importance of the qualitative insights we run for our clients at Conversion.com. 

Last time, we looked at the value of on-site surveys and just how effective they can be when used correctly. 

Competitor Analysis

Competitor analysis is a vital part of understanding your industry.

Anyone familiar with a SWOT analysis knows that understanding who your competitors are, as well as what they’re doing can allow you to understand your place in the market, differentiate yourself from the competition and stand out in the right way.

However, when we look online we see that strategies are more often than not a case of the blind leading the blind, whereby we copy elements from others that we like, but with no insight into if they’re effective, nor what makes them effective.

Of course, we will never truly be able to answer these questions without access to competitor data and a variety of tests exploring the element – but never fear. My hope is that by the end of this article you will be equipped with a framework that gives your competitor analysis…the competitive edge.

Just like in many other aspects of life, knowing yourself is just as important as knowing others. The same applies for a competitor analysis. The better understanding you have of your users, their motivations and barriers, as well as the key site areas for improvement, the better you will be able to diagnose competitor elements that may be of use.

Often a competitor analysis can be an exhaustive task, spanning every page of competitor sites with few actionable insights at the end. Therefore, it is always better to focus your competitor analysis on one specific area at a time. Whether it is the landing page, the checkout funnel, or a product page; by focusing on one area it becomes easier to identify where your experience differs and formulate experiments around this.

At Conversion, we begin by mapping out a client’s main customer journey before using insights to identify key levers on the site (these are the key themes we feel can have an impact on conversion rates). Combining this with analytics data shows us where a site may be underperforming, and this is a great place to start looking at competitors.

How do I conduct a competitor analysis?

I will show you an example using a credit card company, Company X.

After examining our quantitative data, we have established that Company X has a low conversion rate on its application form.

We begin by comparing Company X to its closest competitors. In doing so, we realise that many competitors are chunking their application forms into bite-size steps.

Often, this is where many people would stop and act quickly to replicate this experience on their own sites. However, just because everyone else is doing it, does that make it the best way? The reality is, we still don’t know whether this is the best way to present an application form.

In order to find out, it is important now that we look beyond the client’s industry – this is a great exercise to help us think beyond what our close competitors are doing. How does your registration form compare to Amazon? Does your size guide match up to Asos?

Taking industry best practice, combining this with competitor research and then sprinkling on the uniqueness of your site and users, often leaves you with a test idea that is worth prioritising.

Understanding what your competitors do can help you frame your strategy and optimisation efforts. It is an insight rich exercise that is good for looking at the industry at a macro level, as well as honing in on particular levers and how competitors utilise them.

Competitor analysis
Competitor analysis template for Conversion.com (Click to view)

Here is the standard template we use at Conversion when we begin a competitor analysis. This is a great starting point and can be tweaked and framed to suit different industry needs. With such a large scope of potential insights to gain, a one size fits all approach can rarely be taken. That’s why we use four different templates depending on our desired outcomes.

I will now share two frameworks – one for a broad competitors analysis, and another for a more in-depth analysis.

Broad competitor analysis

If you haven’t conducted a competitor analysis before, this is your first step.

You’ll want to identify 4-5 key competitors within your industry – these can vary from both the low-end to the high-end of the market and will be useful for both understanding what you do and cross-referencing it with your market position.

Starting with your own site first – map out the main user journey as you would a storyboard. An ecommerce site for example, may have a funnel like this:

Landing page -> Category Page -> Product details page -> Basket page -> Checkout

You get the idea.

Take screenshots of each step and make note of the key elements of each page.

Competitor Analysis
Structured Overview of Key Direct Competitors (Click to view)

Now do the same for your competitors, noting any clear contrasts in the tone, content or functionality of the sites.

At Conversion, we would use the template above for mapping this out as it creates a strong basis for comparing sites at a later stage and allows us to add small notes to each site as we go along.

You will soon start to establish patterns across the sites and often these will be the hygiene factors that are consistent within your industry. But most importantly, you should look for the key differences across the sites as these will help form the basis of future test ideas. Maybe all your competitors have a guest checkout – this test concept could have been at the bottom of your backlog, but now you have more context on the industry you will look at your prioritisation differently.

A step further

Now that we have a better understanding of what your competitors are doing in general, let’s take a more focused look at a key element. Using my earlier example of a guest checkout, here is how we would explore this idea.

Competitor Analysis
Visual Map of Competitor Funnel (Click to view)

Once again, we are mapping out the flow – but here we would focus on plotting each step of the guest checkout process, comparing each competitor’s execution at each step. This is a great point to go beyond your competitors and look more broadly at how other companies are addressing this.

Looking further ahead, you may want to do a competitor analysis that looks at a specific lever, e.g. how are other sites presenting social proof to users, what are the ways in which you can include trust elements online. The possibilities are (almost) endless. Always remember though that a competitor analysis should have a goal or key question that you are seeking to answer.

When combined effectively with other qualitative insights such as usability testing and on-site surveys, a competitor analysis can give you a really focused understanding of how your customers behave as well as inspiration for how to improve your website experience.

Through testing these ideas, you will gain a clear understanding of what works best for your users and how to make your website stand out from the crowd.

Look out for our next article in this series where we discuss the importance of heat-maps and scroll-maps.  

Who, When and Where, but what about the Why? Understanding the value of Qualitative Insights: On-site Surveys

Within our data-driven industry, many gravitate towards heavily relying on quantitative data to guide and inform experimentation. With Google Analytics and the metrics it measures (e.g. conversion rate, bounce rate and exit rates) often dominating our focus, it means many undervalue or forget about the other insights we can run.

Numbers, rates and statistics are great for finding out what’s happening on your site and where the opportunities for testing lie. However, what some people still don’t understand is that quantitative insights can only take us so far within conversion rate optimisation. Understanding where to target our tests for the best impact is necessary but it does not provide insight into what our tests should entail. This is where qualitative research takes center stage.

Qualitative research provides us with insight into the “why” behind quantitative research. It provides a deeper understanding into your visitors and customers, which is vital to understanding why they behave and engage with your site in a particular way. Conducting qualitative research is ideal for discovery and exploration and a great way of generating insights which can be used to guide future experimentation.

In this series, we will cover the qualitative insights that we run for our clients at Conversion.com including case studies of when qualitative research has helped with our tests and some of our best practices and tips!

On-site Surveys

By on-site surveys we are referring to targeted pop-up surveys which appear on websites asking either one, or a series of questions to users to gather insights. Hotjar and Qualaroo are two of our favourite data collection tools for this specific insight.

On-site surveys let you target questions to any visitor, or subset of visitors, to any page within your website. These surveys can be prompted to appear to your visitors in a number of ways; time elapsed, specific user behaviour such as a clicked element or exit intent, or custom triggers using Javascript. Understanding the behaviour and intent of website visitors allows us to more effectively interpret motivations and barriers that they may face with your site. These insights can then guide our data-driven tests which aim to emphasise the motivations whilst eliminating the barriers.

On-site surveys have many benefits such as being non-intrusive, immediate in collecting data and they are anonymous which allows for higher ‘ecological’ validity of responses. But most importantly, they have the benefit of gaining real-time feedback from users ‘in the moment’ while they are engaging with your site.

Don’t underestimate the power of an exit survey. An exit survey is triggered when a user shows intent to leave a website, for example, when a user moves their cursor towards the top of the page. Exit surveys are the best non-intrusive qualitative method that provide crucial insights into why visitors are not converting or why your website may have a high exit rate. They often outperform other website surveys in terms of response rates because they minimise the annoyance a survey gives to a user, especially when they were already planning on leaving the site.

But what questions should you be asking in these on-site surveys? Well, that really depends on what you want to get out of this piece of insight. Below are a few examples of the types of questions you can ask:

  • Investigating user intent and bounce rate
    • Why did you come to this site today?
    • What were you hoping to find on this page?
    • Did this page contain the information you were looking for?
  • Understanding usability challenges
    • Were you able to complete your task today? (If yes, why? If no, why not?)
    • Is there anything on the site that doesn’t work the way you expected it to?
  • Uncover missing content
    • What additional information would you like to see on this page?
    • Did this article answer your question?
  • Identify potential motivations
    • What persuaded you to purchase from us?
    • What convinced you to use us rather than a competitor?
    • What was the one thing that influenced you to complete your task/purchase?
  • Identify potential barriers
    • What is preventing you from completing your task?
    • What is stopping you from completing your checkout?  
    • What concerns do you have about purchasing from us?

When launching a survey, it may be difficult to know how long to run it for or how many responses you actually need. In reality, large sample sizes are important when collecting data, however we are more concerned with gaining in-depth understanding into your users, while looking for ideas and inspiration. Therefore we look for thematic saturation, the idea that the data is providing us with no new significant information, instead of large sample sizes. For more information about the sample size required to running an on-site survey and how many responses are necessary, check out our article about on-site surveys and thematic saturation.

At Conversion.com we ensure we are continuously collecting both qualitative and quantitative insights on our clients. On-site surveys are just one of these insights which help to guide and backup our data-driven hypotheses.

An example of when on-site surveys have guided tests to add additional revenue to our clients is with one company within the online pharmacy industry. Our on-site survey asked users what stopped them from choosing a particular product when they were at a specific stage of their user journey. Insights demonstrated that users found it difficult to choose products by themselves with no assistance or help. This informed a test we ran on these specific pages which implemented signposting to particular products through recommendations. We believed that by including elements that could aid users at the product selection stage, it would make it easier for users to select a product, thus eliminating the barrier we found via our on-site survey. Making this change caused an uplift of 3.48% in completed purchases for new users.  

Look out for our next article in this series where we discuss the importance of competitor analysis.  

On-site survey design: Collect voice of customer data like a pro

Brian Balfour, ex VP of growth at Hubspot, said “Math and Metrics Don’t Lie, But They Don’t Tell Us Everything”. I couldn’t agree more. While analytics tells us what happens on our website, qualitative data is crucial for understanding the why behind visitors’ decision-making. By knowing your customers’ pain points and reasons why they love your product, you can stop guessing and hoping to win the odd hand. Instead, you can start addressing your visitors’ real problems, and we are yet to find a better way to sustainably grow your business.

To make good decisions though, you need to nail both collection and analysis of your user data. Your conclusions have to actually reflect your website audience’s problems. We’re used to looking at statistical significance with our test results, but when we’re gathering qualitative feedback, how do we know when we have enough data to draw a meaningful conclusion? The reality is that making sure that your data brings powerful insights is both an art and a science. Today I will explain strategies conversion champions use when analysing qualitative open-ended data.

What are on-site surveys anyways and why should you use them?

On-site surveys are a great way to gather qualitative feedback from your customers. Available tools include Qualaroo and Hotjar.
On-site surveys are a great way to gather qualitative feedback from your customers.

In this article when I refer to on-site surveys, I mean small pop-ups that prompt a visitor to answer a certain question(s). Qualaroo and Hotjar are our favourite data collection tools.

In contrast to other methods of qualitative research, on-site surveys can be:

  • Non-intrusive (they don’t significantly distract visitors from engaging with the website).
  • Anonymous, allowing for higher “ecological” validity of responses. This means that customers tell you what they actually think without trying to conform to your expectations (which may happen in interviews).
  • Don’t require extensive prior experience (as compared with something like interviews).
  • Immediate. In comparison to panels & interviews, you can start collecting data instantly.
  • Contextual. They can provide insights about your customer’s state of mind at a particular stage in your conversion funnel. This allows you to optimize for relevance!

How many responses do I need?

Often when companies run surveys, they aren’t sure how long to run them for. They may ask themselves: “What is the required sample size? Am I better off running a survey for a little bit longer? What % of my website audience should respond for the survey to be representative of their needs?”

I was asking these questions, too. When I studied for Avinash Kaushik’s web analytics certification, he suggested 5% of your overall traffic. At the time, I was looking at running surveys for some smaller websites and Avinash’s rule was applicable to only very large websites, so I could not use it.

Then, Peep Laja suggested having at least 100-200 responses as a minimum. I was not sure if I could apply this to any context though. Are 100 responses going to be as useful for a website with 10,000 monthly visitors as for a website with 1,000,000 daily visitors?

Sample size. Does it even matter?

The reality is that it depends, but most importantly you might be looking at it the wrong way. The primary factor we use in determining the number of required responses is the goal of the survey. At Conversion.com, we primarily use them for the following 2 goals:

  1. Understanding the diversity of factors affecting user behaviour (i.e. what factors motivate or stop visitors from taking a desired action)
  2. Ranking and prioritising these factors (in order to prioritise testing ideas)

The first goal is crucial at the start of every conversion optimization program (and this is the goal we will dive into in this article; for the other goal keep an eye on our future articles).

When pursuing this goal, we are trying to understand the diversity of factors that affect user behavior, and our purpose is not necessarily to make estimations about our website’s audience as a whole.

For example, we are not trying to answer the question of how many people like your product because of reason A or reason B, but we are just curious to understand what are the potential reasons why people like it.

We are more interested in gaining an in-depth understanding of people’s diverse subjective experiences and making meaning out of their responses, even if we are not sure if we can generalize these findings to the website’s audience as a whole. As Stephen Pavlovich puts it: “At this early stage, we’re not looking for a statistically valid breakdown of responses – we’re essentially looking for ideas and inspiration.”

This means that with on-site surveys that pursue goal #1, standard criteria for evaluating quality of your findings such as validity and reliability (think of confidence intervals and margins of error) are not applicable. Instead, you should use thematic saturation.

What is thematic saturation?

When analysing raw data, we categorise responses into themes. Themes are patterns in the data that describe a particular reason for taking or not taking a certain action (or any other factors we are interested in understanding). In simple terms, thematic saturation is when new responses do not bring significant new information, i.e. you start seeing repetition in visitors’ responses and no new themes emerge.

In the context of conversion optimization, this means asking yourself 3 questions:

  1. Have I accurately interpreted and grouped the raw data into themes? i.e. have I identified the customers’ real pain points and motivations for taking a certain action?
  2. Do the responses that I assigned to each of the themes fully explain that theme? (or is there diversity that I have not fully accounted for, i.e. are there any important sub-themes?)
  3. Do the new responses that I have gathered bring new, actionable insights to the table?

If you can answer “Yes”, “Yes” and “No” to the questions above, you are likely to have reached saturation and can stop the survey.

Example:

 

on-site-survey-example3

As you can see in this example, the newest responses did not bring any new surprises. They fell under existing themes. As there was no more diversity in the data, we stopped the survey.

NB: Note how one simple concept of convenience can have several dimensions in your customers’ minds. This is why question 2 is so important. By understanding the differences in the way customers perceive your product’s benefits, you can now design a more effective value proposition!

Indeed, the answers to these questions are subjective and require experience. This is not because the method is ‘bad’, but because we are trying to explain human behaviour and there will always be a degree of subjectivity involved. Don’t be too hard pressed by your quantitative colleagues – some of the most important breakthroughs in history were based on studies with a sample size of 1. Did you know that Freud’s revolutionary theory of psychoanalysis originally started with examination of fewer than 10 client cases?

Minimum number of responses

Does this then mean that you can get away with as few as 10 responses? In theory yes, as long as you gain an in-depth understanding of your customers. It is a common practise in traditional research to set minimum requirements on the number of responses required before you start examining whether your data is saturated.

As a general rule, the team at Conversion.com looks for a minimum number of 200 responses. So does Andre Morys from Web Arts. Peep Laja from ConversionXL responded that he currently uses 200-250 as a minimum. Other professionals, including Craig Sullivan and Brian Massey say that they don’t use a minimum at all. The truth is you can use a minimum number as a guide, but ultimately it’s not the number that matters, but whether you understood diverse problems that your customers have or not.

When using minimums: Don’t think responses in general, remove all the garbage

clean-responses-example

In one survey we ran, 35% of responses appeared to be unusable, ranging from responses like “your mum” to random strikes of digits on a keyboard. When assessing if you passed the minimum threshold, don’t just look at the number of responses your survey tool has gathered, but look at the number of usable “non-garbage” responses.

Don’t rely on magic numbers, but look for saturation

As I have already said don’t rely solely on best practises, but always look for saturation. You need to realise that each website is unique and your ability to reach saturation depends on a number of criteria, including:

  • Your interpretative skills as a researcher (how quickly can you derive meaning from your visitors’ responses?), which in turn depends on your existing knowledge about customers and your familiarity with the industry. So, you are better off gathering more responses as long as they can help you to accurately interpret your audience’s responses.
  • Have you asked the right questions in the first place? It is difficult to derive meaningful insights unless you are asking meaningful questions (if you don’t know what questions to ask, check out this article).
  • Homogeneity/Heterogeneity of your audience. If your business is very niche and much of your audience shares similar characteristics, then you might be able to see strong patterns right from the start. This is less likely for a website with a very diverse audience.

How do I know if the 189th response won’t bring any new perspectives on the issues I am investigating?

The truth is you never know, in particular because every person is unique, but there are strategies we use to check our findings for saturation.

Strategy #1: Validate with a follow-up survey

strategy1

This strategy has three steps:

  1. Run an open-ended survey (survey 1, above)
  2. Identify several themes
  3. Re-run the survey in a multiple choice format to validate if the themes you identified were accurate (survey 2, above)

The first two steps is what you would normally do and you might not get an incredibly high response rate because writing proper feedback is time-consuming. The third step compensates for it though as instead of running an open-ended survey, you run it in the format of multiple choices. The key here is to include an “Other” choice option and ask for an open-ended response in case this option was chosen. This way you can ‘fail safe’ yourself by examining if people tend to choose the “Other” option.

When is it best to use this approach? It’s particularly useful on smaller websites due to low response rates.

Brent Bannon, PhD, ex growth manager at Facebook and founder of LearnRig, suggests that there is another critical reason why you should use close-ended questions as a follow-up.

  1. item non-response [i.e. where a user skips or doesn’t provide a meaningful answer to a question] is much higher for open-ended questions than for closed-ended ones and people who respond to the open-ended question may differ systematically from your target population, so this response set will likely be more representative.
  2. open-ended questions tend to solicit what is top-of-mind even more so than closed-ended questions so you don’t always get the most reasoned responses – this is pretty heavily influenced by memory processes (e.g. frequency and recency of exposure). Using a list of plausible motivations may get you more reliable data if you’re confident you’re not missing important/widespread motivations.

brent-bannon-image

Brent Bannon

Founder of LearnRig

So, be cautious if you are asking people about something that happened a long time in the past.

Strategy #2: Run another open-ended survey to examine a particular theme in more depth

strategy2

This strategy has three steps:

  1. Run an open-ended survey (survey 1, above)
  2. Identify several themes
  3. Run another open-ended survey to examine a particular theme in more depth (survey 2, above)

Sometimes the responses you get might show you that there is a recurring theme, for example there is a problem with trust. However, respondents provide very limited detail about the problem, so although you identified a theme, you have not fully understood what the problem really is (saturation was not reached!). In that case, we would develop another open-ended survey to examine that particular theme because we know that additional responses can yield extra insights and explain the problem in more depth.

Craig Sullivan from Optimal Visit elaborates on that:

The trick with this work is to accept that the questions you ask may not be right first time.  When I first started out, my mentor made me run surveys where it was clear that I’d asked the wrong question or not found the real answer.  He kept tearing them apart until I’d learned to build them better and to iterate them.  Asking good questions is a great start but these will always uncover more questions or need clarification.  Good exploratory research involves uncovering more questions or solidifying the evidence you have.

It’s like shining a light in a circle – the more the area is lit, the more darkness (ignorance) you are in touch with.  What you end up with is a better quality of ignorance – because NOW you actually know more precisely what you DO and DON’T know about your customers.  That’s why iteration of research and AB testing is so vital – because you rarely end at a complete place of total knowledge.

craig-sullivan

Craig Sullivan

Founder of Optimal Visit

When is it best to use this approach? Whenever you have not fully explored a certain theme in sufficient depth and believe that it can lead to actionable insights.

Note: Be cautious if you’re thinking of doing this type of investigation on a theme of “price”. Self-interest bias can kick in and as Stephen Pavlovich puts it “It’s hard to rationalise your response to price. This is one instance where it’s preferable to test it rather than run a survey and then test it.”

Strategy #3: Triangulate

Triangulation is when you cross-check your findings from one method/source with findings from another method/source (full definition here).

For example, when working with a major London airport we cross-checked our findings from on-site surveys with real-life interviews of their customers (two different methods: surveys and interviews; two different sources: online and offline customers). This ensured a high level of understanding of what customers’ problems actually were. Interviews allowed flexibility to go in-depth, whilst surveys showed a broader picture.

Triangulation allows you to ensure you have correctly interpreted responses from your customers, and identified their real barriers and motivations, not some non-existent problems you thought your customers might have. Interviews can provide you with more detailed and full explanations; this in turn would allow you to make more accurate interpretation of your survey results. There is strong support in academic research for using triangulation to enhance understanding of certain phenomenon under investigation.

When best to use it? Always. Cross-checking your survey findings with more in-depth data collection methods such as live chat conversations or interviews is always advisable as it provides you with more useful context to interpret your survey results.

Brian Massey from Conversion Sciences also emphasises the importance of cross-checking your data with analytics:

Onsite surveys have two roles in website optimization.

  1. Answer a specific question, to support or eliminate a specific hypothesis.
  2. Generate new hypotheses that don’t flow from the analytics.

In both cases, we want to corroborate the results with analytics. Self-reported survey data is skewed and often inaccurate. If our survey respondents report that search is important, yet we see that few visitors are searching, we may disregard these results. Behavioral data is more reliable than self-reported information.

brian-massey-jpeg

Brian Massey

Co-founder of Conversion Sciences

Be pragmatic, not perfect

Finally, we need to be realistic that it is not just the overall quality of our findings that matters, but time and opportunity cost required to get them.

That’s why it can be useful to decide on a stopping rule for yourself. Stopping rules could look like these: “After I get 10 more responses and no new themes emerge, I will stop the survey” or “I will run the survey for 2 more days and if no new themes emerge, I will stop it”.

After you pass the minimum threshold and you are sure that you correctly interpreted at least some of the real issues, you might be better off testing rather than perfecting your data.

Remember, conversion optimization is a cyclical process: we use qualitative data to inform our testing, and then we use the results from our tests to inform our next survey.

Key Takeaways

  • Use on-site surveys to understand your users’ barriers and motivations for taking a certain action at a particular stage in your conversion funnel
  • Thematic saturation should be your main quality criteria, not sample size, when trying to understand the diversity of factors that affect your visitors’ decision-making. But if you’re not sure or want to estimate beforehand, 200 responses is a good general rule (when applied to “non-garbage” responses).
  • You can examine if you managed to reach saturation:
    • By running a follow-up survey in a multiple-choice format and examining if people tend to choose “Other” as an option
    • By running a follow-up survey in an open-ended format to better understand a particular theme (if there is ambiguity in the original data)
    • By cross-checking your survey findings with other data sources/collection methods
  • Remember, that results from tests that are backed up by data is the best source of learning about your customers. Take your initial findings with caution and learn from how your users behave, not how they tell you they behave.

 

5 questions you should be asking your customers

On-site survey tools provide an easy way to gather targeted, contextual feedback from your customers. Analysis of user feedback is an essential part of understanding motivations and barriers in the decision making processes.

It can be difficult to know when and how to ask the right questions in order to get the best feedback without negatively affecting the user experience. Here are our top 5 questions and tips on how to get the most out of your on-site surveys.

On-site surveys are a great way to gather qualitative feedback from your customers. Available tools include Qualaroo and Hotjar.
On-site surveys are a great way to gather qualitative feedback from your customers. Available tools include Qualaroo and Hotjar.

1. What did you come to < this site > to do today?

Where: On your landing pages

When: After a 3-5 second delay

Why: First impressions are important and that is why your landing pages should have clear value propositions and effective calls to action. Identifying user intentions and motivations will help you make pages more relevant to your users and increase conversion rates at the top of the funnel.

2. Is there any other information you need to make your decision?

Where: Product / pricing pages

When: After scrolling 50% / when the visitor attempts to leave the page

Why: It is important to identify and prioritise the information your users require to make a decision. It can be tempting to hide extra costs or play down parts of your product or service that are missing but this can lead to frustration and abandonment. Asking this question will help you identify the information that your customers need to make a quick, informed decision.

3. What is your biggest concern or fear about using us?

Where: Product / pricing pages

When: After a 3-5 second delay

Why: Studies have found that “…fear influences the cognitive process of decision-making by leading some subjects to focus excessively on catastrophic events.”.  Asking this question will help you identify and alleviate those fears, and reduce the negative ffect they may be having on your conversion rates.

4. What persuaded you to purchase from us today?

Where: Thank you / confirmation page

When: Immediately after purchase. Ideally embedded in the page (try Wufoo forms)

Why: We find that some of our most useful insights come from users who have just completed a purchase. It’s a good time to ask what specifically motivated a user to purchase. Asking this question will help you identify and promote aspects of your service that are most appealing to your customers.

5. Was there anything that almost stopped you buying today?  

Where: Thankyou / confirmation page

When: Immediately after purchase

Why: We find that users are more clear about what would have stopped them purchasing after they have made a purchase. Asking this question can help you identify the most important barriers that are preventing users from converting. Make sure to address these concerns early in the user journey to avoid surprises and reduce periods of uncertainty.

What questions have you asked your customers recently? Have you asked anything that generated valuable insights? Share in the comments below!

Spotting patterns – the difference between making and losing money in A/B testing.

Wrongly interpreting the patterns in your A/B test results can lose you money. It can lead you to make changes to your site that actually harm your conversion rate.

Correctly interpreting the patterns in your A/B test results will mean you learn more from each test you run. It will will give you confidence that you are only implementing changes that will deliver real revenue impact, and it will help you turn any losing tests into future winners.

At Conversion.com we’ve run and analysed hundreds of A/B and multivariate tests. In our experience, the result of a test will generally fall into one of 5 distinct patterns. We’re going to share these five patterns here, and we’ll tell you what each pattern means in terms of what steps you should take next. Learn to spot these patterns, follow our advice on how to interpret them, and you’ll be making the right decision, more often – making your testing efforts more successful.

To illustrate each of the patterns, we’ll imagine we have run an A/B test on an e-commerce site’s product page and are now looking at the results. We’ll be looking at the increase/decrease in conversion rate that the new version of this page delivered compared to the original page. We’ll be looking at this on a page-by-page basis for the four steps in the checkout process that the visitor goes through in order to complete their purchase (Basket, Checkout, Payment and finally Order Confirmation).

To see the pattern in our results in each case, we’ll plot a simple graph of the conversion rate increase/decrease to each page. We’ll then look at how this increase/decrease in conversion rate has changed as we move through our site’s checkout funnel.

1. The big winner

This is the type of test result we all love. Your new version of a page converts at x% higher to the next step than the original and this x% increase continues uniformly all the way to Order Confirmation.

The graph of our first result pattern would look like this.

The big winner

We see 10% more visitors reaching each step of the funnel.

Interpretation

This pattern is telling us that the new version of the test page successfully encourages 10% more visitors to reach the next step and from there onwards they convert equally as well as existing visitors. The overall result would be a 10% increase in sales. It is clearly logical to implement this new version permanently.

2. The big loser

The negative version of this pattern, where each step shows a roughly equal decrease in conversion rate, is a sign that the change that was made has had a clear negative impact. All is not lost though, often an unsuccessful test can be more insightful than a straightforward winner as the negative result forces you to re-evaluate your initial hypothesis and understand what went wrong. You may have stumbled upon a key conversion barrier for your audience and addressing this barrier in the next test could lead to the positive result you have been looking for.

Graphically this pattern will look like this.

The big loser

We see 10% fewer visitors reaching each step of the funnel.

Interpretation

As the opposite of the big winner, this pattern is telling us that the new version of the test page causes 10% fewer visitors to reach the next step and from there onwards they convert equally as well as existing visitors. The overall result would be a 10% decrease in sales. You would not want to implement this new version of the page.

3. The clickbait

“We increased clickthrus by 307%!” You’ve probably seen sensational headlines like this being thrown around by people in the optimisation industry. Hopefully, like us, you’ve developed a strong sense of cynicism when you read results like this. The first question I always ask is “But how much did sales increase by?”. Chances are, if the result being reported fails to mention the impact on final sales then what they actually saw in their test results was this pattern that we’ve affectionately dubbed “The clickbait”.

Test results that follow this pattern will show a large increase in the conversion rate to the next step but then this improvement quickly fades away in the later steps and finally there is little or no improvement to Order Confirmation.

Graphically this pattern will look like this.

The clickbait

Interpretation

This pattern catches people out as the large improvement to the next step feels as if it should be a positive result. However, often this pattern of results is merely showing that the new version of the page is pushing a large amount of visitors through to the next step who have no real intention of purchasing. This is illustrated by the sudden large drop in the conversion rate improvement at the later steps when all of the unqualified extra traffic abandons the funnel.

As with all tests, whether this result can be deemed a success depends on the specifics of the site you are testing on and what you are looking to achieve. If there are clear improvements to be made on the next step(s) of the funnel that could help to convert the extra traffic from this test, then it could make sense to address those issues first and then re-run this test. However, if these extra visitors are clicking through by mistake or because they are being misled in any way then you may find it difficult to convert them later no matter what changes you make. Instead, you could be alienating potential customers by delivering a poor customer experience. You’ll also be adding a lot of noise to the data of any tests you run on the later pages as there are a lot of extra visitors on those pages who are unlikely to ever purchase.

4. The qualifying change

The third pattern is almost the reverse of the second in that here we actually see a drop in conversion to the next step but an overall increase in conversion to order confirmation.

Graphically this pattern looks like this.

The qualifying change

Interpretation

Taking this pattern as a positive can seem counter-intuitive because of the initial drop in conversion to the next step. Arguably, this type of result is actually as good if not better than a big winner from pattern 1. Here the new version of the test page is having what’s known as a qualifying effect. Visitors who may have otherwise abandoned at a later step in the funnel are leaving at the first step instead. Those visitors who do continue past the test page on the other hand are more qualified and therefore convert at a much higher rate. This explains the positive result to Order Confirmation.

Implementing a change that causes this type of pattern means visitors remaining in the funnel now have expressed a clearer desire to purchase. If visitors are still abandoning at a later stage in the funnel, the likelihood now is that this is being caused by a specific weakness on one of those pages. Having removed a lot of the noise from our data, in the form of the unqualified visitors, we are left with a much more reliable measure of the effectiveness of the later steps in the funnel. This means identifying weaknesses in the funnel itself will be far easier.

As with pattern 2 there are circumstances where a result like this may not be preferable. If you already have very low traffic in your funnel then reducing that further could make it even more difficult to get statistically significant results when testing on the later pages of the funnel. You may want to look at tests to drive more traffic to the start of your funnel before implementing a change like this.

5. The messy result

This final pattern is often the most difficult to extract insight from as it describes results that show very little pattern whatsoever. Here we often see both increases and decreases in conversion rate to the various steps in the funnel.

The messy result

Interpretation

First and foremost, a lack of a discernible pattern in the results of your split-test can be a tell-tale sign of insufficient levels of data. At the early stages of experiments, when data levels are low, it is not uncommon to see results fluctuating up and down. Reading too much into these results at this stage is a common pitfall. Resist the temptation of checking your experiment results too frequently – if at all – in the first few days. Even apparently strong patterns that emerge at these early stages can quickly disappear with a larger sample.

If your test has a large volume of data, and you’re still seeing this type of result, then the likelihood is that your new version of the page is delivering a combination of the effects from the clickbait and the qualifying change patterns. Qualifying some traffic but simultaneously pushing more unqualified traffic through the funnel. If your test involved making multiple changes to a page, try testing the changes separately to pinpoint which individual changes are causing the positive impact and which are causing the negative impact.

Key takeaways

The key point to take from all of these patterns is the importance of tracking and analysing the results at every step of your funnel when you A/B test, rather than just the next step after your test page. It is easy to see how if only the next step was tracked that many tests can have been falsely declared as winners or losers. In short, this is losing you money.

Detailed test tracking will allow you to pinpoint the exact step in your funnel that visitors are abandoning, and how that differs for each variation of the page that you are testing. This can help to answer the more important question of why they are abandoning. If the answer to this is not obvious, running some user tests or watching some recorded user sessions of your test variations can help you to develop these insights and come up with a successful follow up test.

There is a lot more to analysing A/B tests than just reading off a conversion rate increase to any single step in your funnel. Often, the pattern of the results can reveal greater insights than the individual numbers. Avoid jumping to conclusions based on a single increase or decrease in conversion to the next step and always track right the way through to the end of your funnel when running tests. Next time you go to analyse a test result, see which of these patterns it matches and consider the implications for your site.