Beyond A vs. B: How to get better results with better experiment design

Nick So

Overview

1. The basics: Defining A/B, MVT, and fractional factorial
2. Fractional factorial design: The middle ground
3. The Exploration Phase
4. Fractional factorial design in action
5. Case Study [VIDEO]
6. Case Study—step-by-step
7. Takeaways

You’ve been pushing to do more testing at your organization.

You believe in marketing backed by science and data, and you have worked to get the executive team at your company on board with a tested strategy. Employing a formal plan will allow you to learn more about your customers and grow your business.

You run A/B tests, but you aren’t seeing a substantial conversion rate lift and you’re concerned that results aren’t helping to inform business goals for your management team. You could increase the velocity of your testing to get some quick wins, but if you want fast wins, you sacrifice insights.

Instead, you need to reexamine how you are structuring your tests. Because, as Alhan Keser writes,

If your results are disappointing, it may not only be what you are testing – it is definitely how you are testing. While there are several factors for success, one of the most important to consider is Design of Experiments (DOE).

For this post, I teamed up with Director of Optimization Strategy, Nick So, and Optimization Strategist, Michael St Laurent, to take a deeper look at the best ways to structure your experiments for maximum growth and insights.

The basics: Defining A/B, MVT, and fractional factorial

Marketers often use the term ‘A/B testing’ to refer to marketing experimentation in general. But there are multiple different ways to structure your experiments. A/B testing is just one of them.

Let’s look at a few: A/B testing, A/B/n testing, full factorial or multivariate (MVT), and fractional factorial design.

A/B test

In an A/B test, you are testing your original page / experience (A) against a single variation (B) to see which will result in a higher conversion rate. Variation B might feature a multitude of changes (i.e. a ‘cluster’) of changes, or an isolated change.

“ab
When you change multiple elements in a single variation, you might see lift, but what about insights?

In an A/B/n test, you are testing more than two variations of a page at once. “N” refers to the number of versions being tested, anywhere from two versions to the “nth” version.

Full factorial or multivariate test (MVT)

With full factorial or multivariate testing, you are testing each, individual change, isolated one against another, by mixing and matching every possible combination available.

Imagine you want to test a homepage re-design with four changes in a single variation:

  • Change A: New hero banner
  • Change B: New call-to-action (CTA) copy
  • Change C: New CTA color
  • Change D: New value proposition statement

Hypothetically, let’s assume that each change has the following impact on your conversion rate:

  • Change A = +10%
  • Change B = +5%
  • Change C = -25%
  • Change D = +5%

If you were to run a classic A/B test―your current control page (A) versus a combination of all four changes at once (B)―you would get a hypothetical decrease of -5% overall (10% + 5% – 25% +5%). You would assume that your re-design did not work and most likely discard the ideas.

With a multivariate test, however, each of the following would be a variation:

mvt widerfunnel
When you change multiple elements in a single variation, you might see lift, but what about insights?

Multivariate testing is great because it shows you the positive or negative impact of every single change, and every single combination of every change, resulting in the most ideal combination (in this theoretical example: A + B + D).

However, this strategy is difficult to execute in the real world. Even if you have a ton of traffic, it would take more time than most marketers have for a test with 15 variations to reach any kind of statistical significance.

The more variations you test, the more your traffic will be split while testing, and the longer it will take for your tests to reach statistical significance. Many companies simply can’t follow the principles of MVT because they don’t have enough traffic.

– Alhan Keser, Product Manager, AI

Enter fractional factorial experiment design. Fractional factorial design allows for the speed of pure A/B testing combined with the insights of multivariate testing.

Fractional factorial design: The middle ground

Fractional factorial design is another method of Design of Experiments. Similar to MVT, fractional factorial design allows you to test more than one element change within the same variation.

The greatest difference is that fractional factorial design doesn’t force you to test every possible combination of changes.

Rather than creating a variation for every combination of changed elements (as you would with MVT), you can design your experiment to focus on specific isolations that you hypothesize will have the biggest impact.

With basic fractional factorial experiment design, you could set up the following variations in our hypothetical example:

VarA: Change A = +10%
VarB: Change A + B = +15%
VarC: Change A + B + C = -10%
VarD: Change A + B + C + D = -5%

Fractional factorial design widerfunnel
In this basic example, variation A features a single change; VarB is built on VarA, and VarC is built on VarB.

NOTE: With fractional factorial design, estimating the value (e.g. conversion rate lift) of each change is a bit more complex than shown above. I’ll explain.

Firstly, let’s imagine that our control page has a baseline conversion rate of 10% and that each variation receives 1,000 unique visitors during your test.

When you estimate the value of change A, you are using your control as a baseline.

fractional factorial testing widerfunnel
Variation A versus the control.

Given the above information, you would estimate that change A is worth a 10% lift by comparing the 11% conversion rate of variation A against the 10% conversion rate of your control.

The estimated conversion rate lift of change A = (11 / 10 – 1) = 10%

But, when estimating the value of change B, variation A must become your new baseline.

fractional factorial testing widerfunnel
Variation B versus variation A.

The estimated conversion rate lift of change B = (11.5 / 11 – 1) = 4.5%

As you can see, the ‘value’ of change B is slightly different from the 5% difference shown above.

When you structure your tests with fractional factorial design, you can work backwards to isolate the effect of each individual change by comparing variations. But, in this scenario, you have four variations instead of 15.

We are essentially nesting A/B tests into larger experiments so that we can still get results quickly without sacrificing insights gained by isolations.

– Michael St Laurent, Director of Experimentation Strategy& Product, Conversion

Then, you would simply re-validate the hypothesized positive results (Change A + B + D) in a standard A/B test against the original control to see if the numbers align with your prediction.

Fractional factorial allows you to get the best potential lift, with five total variations in two tests, rather than 15 variations in a single multivariate test.

But, wait…

It’s not always that simple. How do you hypothesize which elements will have the biggest impact? How do you choose which changes to combine and which to isolate?

The Exploration Phase

The answer lies in the Explore (or research gathering) phase of your testing process.

At Conversion, Explore is an expansive thinking zone, where all options are considered. Ideas are informed by your business context, persuasion principles, digital analytics, user research, and your past test insights and archive.

Experience is the other side to this coin. A seasoned optimization strategist can look at the proposed changes and determine which changes to combine (i.e. cluster), and which changes should be isolated due to risk or potential insights to be gained.

At Conversion, we don’t just invest in the rigorous training of our Strategists. We also have a 10-year-deep test archive that our Strategy team continuously draws upon when determining which changes to cluster, and which to isolate.

Fractional factorial design in action: A case study

This case follows two experiments we ran on Annie Selke, a retailer of luxury home-ware goods. Our experiment focuses on testing a product category page. (You may have already read about what we did during this test, but now I’m going to get into the details of how we did it. It’s a fantastic illustration of fractional factorial design in action.

Experiment 4.7

In the first experiment, we tested three variations against the control. As the experiment number suggests, this was not the first test we ran with Annie Selke, in general. But it is the ‘first’ test in this story.

Variation A featured an isolated change to the ‘Sort By’ filters below the image, making it a drop down menu.

ab testing marketing example

Replaced original ‘Sort By’ categories with a more traditional drop-down menu.

Evidence

This change was informed by qualitative click map data, which showed low interaction with the original filters. Strategists also theorized that, without context, visitors may not even know that these boxes are filters (based on e-commerce best practices). This variation was built on the control.

Variation B was also built on the control, and featured another isolated change to reduce the left navigation.

ab testing marketing example
Reduced left-hand navigation.

Evidence

Click map data showed that most visitors were clicking on “Size” and “Palette”, and past testing had revealed that Annie Selke visitors were sensitive to removing distractions. Plus, the persuasion principle, known as the Paradox of Choice, theorizes that more choice = more anxiety for visitors.

Unlike variation B, variation C was built on variation A, and featured a final isolated change: a collapsed left navigation.

Collapsed left-hand filter (built on VarA).
Collapsed left-hand filter (built on VarA).

Evidence

This variation was informed by the same evidence as variation B.

Results

Variation A (built on the control) saw a decrease in transactions of -23.2%.
Variation B (built on the control) saw no change.
Variation C (built on variation A) saw a decrease in transactions of -1.9%.

But wait! Because variation C was built on variation A, we knew that the estimated value of change C (the collapsed filter), was 19.1%.

The next step was to validate our estimated lift of 19.1% in a follow up experiment.

Experiment 4.8

The follow-up test also featured three variations versus the original control. Because, you should never waste the opportunity to gather more insights!

Variation A was our validation variation. It featured the collapsed filter (change C) from 4.7’s variation C, but maintained the original ‘Sort By’ functionality from 4.7’s control.

ab testing marketing example
Collapsed filter & original ‘Sort By’ functionality.

Variation B was built on variation A, and featured two changes emphasizing visitor fascination with colors. We 1) changed the left nav filter from “palette” to “color”, and 2) added color imagery within the left nav filter.

ab testing marketing example
Updated “palette” to “color”, and added color imagery. (A variation featuring two clustered changes).

Evidence

Click map data suggested that Annie Selke visitors are most interested in refining their results by color, and past test results also showed visitor sensitivity to color.

Variation C was built on variation A, and featured a single isolated change: we made the collapsed left nav persistent as the visitor scrolled.

ab testing marketing example
Made the collapsed filter persistent.

Evidence

Scroll maps and click maps suggested that visitors want to scroll down the page, and view many products.

Results

Variation A led to a 15.6% increase in transactions, which is pretty close to our estimated 19% lift, validating the value of the collapsed left navigation!

Variation B was the big winner, leading to a 23.6% increase in transactions. Based on this win, we could estimate the value of the emphasis on color.

Variation C resulted in a 9.8% increase in transactions, but because it was built on variation A (not on the control), we learned that the persistent left navigation was actually responsible for a decrease in transactions of -11.2%.

This is what fractional factorial design looks like in action: big wins, and big insights, informed by human intelligence.

The best testing framework for you

What are your testing goals?

If you are in a situation where potential revenue gains outweigh the potential insights to be gained or your test has little long-term value, you may want to go with a standard A/B cluster test.

If you have a sufficient amount of traffic, and value insights above everything, multivariate may be for you.

If you want the growth-driving power of pure A/B testing, as well as insightful takeaways about your customers, you may want to explore fractional factorial design.

Words of encouragement: With fractional factorial design, your tests will get better as you continue to test. With every test you execute, you will learn more about how your customers purchasing behvior—making subsequent experiments more impactful.

One 10% win without insights may turn heads your direction now, but a test that delivers insights can turn into five 10% wins down the line. It’s similar to the compounding effect: collecting insights now can mean massive payouts over time.

– Michael St Laurent

Join 5,000 other people who get our newsletter updates