
Does McDonald's homepage McDeliver?
Conversion Benchmarking Report #1
How elite subscription brands like Netflix, Spotify, and HelloFresh are making UX a competitive advantage
By comparing Spotify, Netflix, HelloFresh, Headspace, and Kiwico, we used our pioneering benchmarking technique to gain valuable critical insights into how these leading brands power their UX.
Access now1. The basics: Defining A/B, MVT, and fractional factorial
2. Fractional factorial design: The middle ground
3. The Exploration Phase
4. Fractional factorial design in action
5. Case Study [VIDEO]
6. Case Study—step-by-step
7. Takeaways
You believe in marketing backed by science and data, and you have worked to get the executive team at your company on board with a tested strategy. Employing a formal plan will allow you to learn more about your customers and grow your business.
You run A/B tests, but you aren’t seeing a substantial conversion rate lift and you’re concerned that results aren’t helping to inform business goals for your management team. You could increase the velocity of your testing to get some quick wins, but if you want fast wins, you sacrifice insights.
Instead, you need to reexamine how you are structuring your tests. Because, as Alhan Keser writes,
If your results are disappointing, it may not only be what you are testing – it is definitely how you are testing. While there are several factors for success, one of the most important to consider is Design of Experiments (DOE).
For this post, I teamed up with Director of Optimization Strategy, Nick So, and Optimization Strategist, Michael St Laurent, to take a deeper look at the best ways to structure your experiments for maximum growth and insights.
Marketers often use the term ‘A/B testing’ to refer to marketing experimentation in general. But there are multiple different ways to structure your experiments. A/B testing is just one of them.
Let’s look at a few: A/B testing, A/B/n testing, full factorial or multivariate (MVT), and fractional factorial design.
In an A/B test, you are testing your original page / experience (A) against a single variation (B) to see which will result in a higher conversion rate. Variation B might feature a multitude of changes (i.e. a ‘cluster’) of changes, or an isolated change.
In an A/B/n test, you are testing more than two variations of a page at once. “N” refers to the number of versions being tested, anywhere from two versions to the “nth” version.
With full factorial or multivariate testing, you are testing each, individual change, isolated one against another, by mixing and matching every possible combination available.
Imagine you want to test a homepage re-design with four changes in a single variation:
Hypothetically, let’s assume that each change has the following impact on your conversion rate:
If you were to run a classic A/B test―your current control page (A) versus a combination of all four changes at once (B)―you would get a hypothetical decrease of -5% overall (10% + 5% – 25% +5%). You would assume that your re-design did not work and most likely discard the ideas.
With a multivariate test, however, each of the following would be a variation:
Multivariate testing is great because it shows you the positive or negative impact of every single change, and every single combination of every change, resulting in the most ideal combination (in this theoretical example: A + B + D).
However, this strategy is difficult to execute in the real world. Even if you have a ton of traffic, it would take more time than most marketers have for a test with 15 variations to reach any kind of statistical significance.
The more variations you test, the more your traffic will be split while testing, and the longer it will take for your tests to reach statistical significance. Many companies simply can’t follow the principles of MVT because they don’t have enough traffic.
– Alhan Keser, Product Manager, AI
Enter fractional factorial experiment design. Fractional factorial design allows for the speed of pure A/B testing combined with the insights of multivariate testing.
Fractional factorial design is another method of Design of Experiments. Similar to MVT, fractional factorial design allows you to test more than one element change within the same variation.
The greatest difference is that fractional factorial design doesn’t force you to test every possible combination of changes.
Rather than creating a variation for every combination of changed elements (as you would with MVT), you can design your experiment to focus on specific isolations that you hypothesize will have the biggest impact.
With basic fractional factorial experiment design, you could set up the following variations in our hypothetical example:
VarA: Change A = +10%
VarB: Change A + B = +15%
VarC: Change A + B + C = -10%
VarD: Change A + B + C + D = -5%
NOTE: With fractional factorial design, estimating the value (e.g. conversion rate lift) of each change is a bit more complex than shown above. I’ll explain.
Firstly, let’s imagine that our control page has a baseline conversion rate of 10% and that each variation receives 1,000 unique visitors during your test.
When you estimate the value of change A, you are using your control as a baseline.
Given the above information, you would estimate that change A is worth a 10% lift by comparing the 11% conversion rate of variation A against the 10% conversion rate of your control.
The estimated conversion rate lift of change A = (11 / 10 – 1) = 10%
But, when estimating the value of change B, variation A must become your new baseline.
The estimated conversion rate lift of change B = (11.5 / 11 – 1) = 4.5%
As you can see, the ‘value’ of change B is slightly different from the 5% difference shown above.
When you structure your tests with fractional factorial design, you can work backwards to isolate the effect of each individual change by comparing variations. But, in this scenario, you have four variations instead of 15.
We are essentially nesting A/B tests into larger experiments so that we can still get results quickly without sacrificing insights gained by isolations.
– Michael St Laurent, Director of Experimentation Strategy& Product, Conversion
Then, you would simply re-validate the hypothesized positive results (Change A + B + D) in a standard A/B test against the original control to see if the numbers align with your prediction.
Fractional factorial allows you to get the best potential lift, with five total variations in two tests, rather than 15 variations in a single multivariate test.
But, wait…
It’s not always that simple. How do you hypothesize which elements will have the biggest impact? How do you choose which changes to combine and which to isolate?
The answer lies in the Explore (or research gathering) phase of your testing process.
At Conversion, Explore is an expansive thinking zone, where all options are considered. Ideas are informed by your business context, persuasion principles, digital analytics, user research, and your past test insights and archive.
Experience is the other side to this coin. A seasoned optimization strategist can look at the proposed changes and determine which changes to combine (i.e. cluster), and which changes should be isolated due to risk or potential insights to be gained.
At Conversion, we don’t just invest in the rigorous training of our Strategists. We also have a 10-year-deep test archive that our Strategy team continuously draws upon when determining which changes to cluster, and which to isolate.
This case follows two experiments we ran on Annie Selke, a retailer of luxury home-ware goods. Our experiment focuses on testing a product category page. (You may have already read about what we did during this test, but now I’m going to get into the details of how we did it. It’s a fantastic illustration of fractional factorial design in action.
In the first experiment, we tested three variations against the control. As the experiment number suggests, this was not the first test we ran with Annie Selke, in general. But it is the ‘first’ test in this story.
Variation A featured an isolated change to the ‘Sort By’ filters below the image, making it a drop down menu.
Replaced original ‘Sort By’ categories with a more traditional drop-down menu.
Evidence
This change was informed by qualitative click map data, which showed low interaction with the original filters. Strategists also theorized that, without context, visitors may not even know that these boxes are filters (based on e-commerce best practices). This variation was built on the control.
Variation B was also built on the control, and featured another isolated change to reduce the left navigation.
Evidence
Click map data showed that most visitors were clicking on “Size” and “Palette”, and past testing had revealed that Annie Selke visitors were sensitive to removing distractions. Plus, the persuasion principle, known as the Paradox of Choice, theorizes that more choice = more anxiety for visitors.
Unlike variation B, variation C was built on variation A, and featured a final isolated change: a collapsed left navigation.
Evidence
This variation was informed by the same evidence as variation B.
Results
Variation A (built on the control) saw a decrease in transactions of -23.2%.
Variation B (built on the control) saw no change.
Variation C (built on variation A) saw a decrease in transactions of -1.9%.
But wait! Because variation C was built on variation A, we knew that the estimated value of change C (the collapsed filter), was 19.1%.
The next step was to validate our estimated lift of 19.1% in a follow up experiment.
The follow-up test also featured three variations versus the original control. Because, you should never waste the opportunity to gather more insights!
Variation A was our validation variation. It featured the collapsed filter (change C) from 4.7’s variation C, but maintained the original ‘Sort By’ functionality from 4.7’s control.
Variation B was built on variation A, and featured two changes emphasizing visitor fascination with colors. We 1) changed the left nav filter from “palette” to “color”, and 2) added color imagery within the left nav filter.
Evidence
Click map data suggested that Annie Selke visitors are most interested in refining their results by color, and past test results also showed visitor sensitivity to color.
Variation C was built on variation A, and featured a single isolated change: we made the collapsed left nav persistent as the visitor scrolled.
Evidence
Scroll maps and click maps suggested that visitors want to scroll down the page, and view many products.
Results
Variation A led to a 15.6% increase in transactions, which is pretty close to our estimated 19% lift, validating the value of the collapsed left navigation!
Variation B was the big winner, leading to a 23.6% increase in transactions. Based on this win, we could estimate the value of the emphasis on color.
Variation C resulted in a 9.8% increase in transactions, but because it was built on variation A (not on the control), we learned that the persistent left navigation was actually responsible for a decrease in transactions of -11.2%.
This is what fractional factorial design looks like in action: big wins, and big insights, informed by human intelligence.
If you are in a situation where potential revenue gains outweigh the potential insights to be gained or your test has little long-term value, you may want to go with a standard A/B cluster test.
If you have a sufficient amount of traffic, and value insights above everything, multivariate may be for you.
If you want the growth-driving power of pure A/B testing, as well as insightful takeaways about your customers, you may want to explore fractional factorial design.
Words of encouragement: With fractional factorial design, your tests will get better as you continue to test. With every test you execute, you will learn more about how your customers purchasing behvior—making subsequent experiments more impactful.
One 10% win without insights may turn heads your direction now, but a test that delivers insights can turn into five 10% wins down the line. It’s similar to the compounding effect: collecting insights now can mean massive payouts over time.
– Michael St Laurent