Prepare for Launch: Lessons from 1,000 A/B Test Launches

In this article, we provide a guide for the A/B test launch process that will help you to keep your website safe and to keep your colleagues and/or clients happy. 

You’ve spent weeks, maybe months, preparing for this A/B test. You’ve seen it develop from a hypothesis, to a wireframe, through design, build and QA. Your team (or client, if you work agency-side) are excited for it to go live and all that’s left to push is the big red button. (Or the blue one, if you’re using Optimizely). Real users are about to interact with your variation and, hopefully, it’ll make them more likely to convert: to buy a product, to register for an account or simply to make that click.

But for all the hours you’ve put into preparing this test, the work is not over yet. At Conversion, we’ve launched thousands of A/B tests for our clients. The vast majority of those launches have gone smoothly, but launching a test can be intense and launching it properly is crucial. While we’re flexible and work with and around our clients, there are some fixed principles we adhere to when we launch an A/B test.

Get the basics right

Let’s start with the simplest step: always check that you’ve set the test up correctly in your testing platform. The vast majority of errors I have witnessed in the launching of tests have been minor errors in this part of the process. Make sure that you have:

  • Targeted the correct page or pages;
  • Allocated traffic to your Control and Variation/s;
  • Included the right audience in your test.

Enough said.

Map out the user journey

You and your team might know your business and its website better than anyone, but being too close to a subject can sometimes leave you with blinkered vision. By the end of the development process, you’ll be so close to your build that you might not be able to view it objectively.

Remember that your website will have many different users and use cases. Sure, you’re hoping that your user will find their way from the product page, to the basket page, to the payment page more easily in your variation. But, have you considered how your change will impact users who want to apply a voucher? Do returning users do something new users don’t? Could your change alienate them in some way? How does your test affect users who are logged in as well as logged out? (Getting that last one wrong caused my team a sleepless night earlier this year!)

Make sure you have thought about the different use cases happening on your website. Ask yourself:

  • Have I considered all devices? If the test is for mobile users, have you considered landscape and portrait?
  • Does your test apply across all geographies? If not, have you excluded the right ones?
  • Have you considered how a returning user’s journey differs from that of a new user?

One of the best ways to catch small errors is to involve colleagues who haven’t been as close to the test during the QA process. Ask them to try and identify user cases that you hadn’t considered. And if they do manage to find new ones, add these to your QA checklist to make sure future tests are checked for their impact on these users.

Test your goals

No matter how positively your users receive the changes you’ve made in your variation, your A/B test will only be successful if you can report back to your team or client with confidence. It’s important that you add the right goals to your results page, and that they fire as intended.

At Conversion, shortly before we launch a test, we work our way through both the Control and Variation and deliberately trigger each goal we’ve included: pageviews, clicks and custom goals too. We then check that these goals have been captured in two ways:

  1. We use the goals feature in our Optimizely Chrome Extension to see the goal firing in real-time.
  2. A few minutes later, we check to see that the action has been captured against the goal in the testing platform.

This can take a little time (and let’s be honest, it’s not the most interesting task) but it’ll save you a lot of time down the line if you find a goal isn’t firing as intended.

Know your baseline

From the work  you’ve done in preparation, you should know how many people you expect to be included in your experiment e.g. how many mobile users in Scotland you’re likely to get in a two-week period. In the first few minutes and hours after you’ve launched a test, it’s important to make sure that the numbers you’re seeing in your testing platform are close to what you’d expect them to be.

(If you don’t have a clear notion of how many users you expect to receive into your test, use your analytics platform to define your audience and review the number of visits over a comparable period. Alternatively, you could use your testing platform to run an A/A test where you do not make any changes in the variation. That way, you can get an idea of the traffic levels for that page).

If you do find that the number of visits to your test is lower than you’d expect, make sure that you have set the correct traffic allocation up in your testing tool. It may also be worth checking that your testing tool snippet is implemented correctly on the page. If you find that the number of visits to your test is higher than you’d expect, make sure you’re targeting the right audience and not including any groups you’d planned to exclude. (Handy hint: check you haven’t accidentally used the OR option in the audience builder instead of the AND option. It can catch you out!) Also, make sure that you’re measuring like-for-like i.e. are you looking at unique visits in your analytics tool and comparing it to unique hits to your test.

Keep your team informed

At Conversion, our Designers and Developers are involved in the QA process and so they know when a test is about to launch. (We’ve recently added a screen above our bank of desks showing the live test results. That way everyone can celebrate [or commiserate] the fruits of their labour!) When the test has been live for a few minutes, and we’re happy that goals are firing, we let our client know and ask them to keep an eye on it too.

Check the test regularly

So the test is live. Having a test live on a site (especially when you’re managing that for a client) is a big responsibility. Provided you’ve taken all the right steps earlier in the process, you should have nothing to worry about, but you should take precautions nonetheless.

Once you’ve pressed the Play button, go over to the live site and make sure you can see the test. Try and get bucketed into both the Control and Variation to sense check that the test is now visible to real users.

At Conversion, there’ll be someone monitoring the test results, refreshing every few minutes, for the first couple of hours the test is live. We’ll check in on the test every day that it runs. That person also checks that there’s at least one hit against each goal and that the traffic level is as expected.

A couple of hours into the running of a test, we’ll make sure that any segments we have set up (e.g. Android users, logged in users, users who interacted with our new element) are firing. You don’t want to run a test for a fortnight and then find that you can’t report back on key goals and segments.

(Tip: if you’re integrating analytics tools into your test make sure they’re switched on and check inside of those tools soon after the test launches to make sure you have heatmap, clickmap or session recording data coming through).

Make sure you have a way to pause the test if you spot anything amiss, and we’d recommend not launching on a Friday, unless someone can check the results over the weekend.

Finally, don’t be afraid to pause

After all the buildup and excitement of launching, it can feel pretty depressing having to press the pause button if you suspect something isn’t quite right. Maybe a goal isn’t firing or you’ve forgotten to add a segment that would come in very handy when it’s time to report on the results. Don’t be afraid to pause the test. In most cases, it will be worth a small amount of disruption at the start, to have trustworthy numbers at the other end. Hopefully, you’ll spot these issues early on. When this happens, we prefer to reset the results to ensure they’re as accurate as they can be.


Launching an A/B test can be a real thrill. You finally get to know whether that ear-worm of an idea for an improvement will actually work. In the few hours either side of that launch, make sure you’ve done what you need to do to preserve confidence in the results to come and to keep your team and client happy:

  • Get the basics right: it’s easy to make a small error in the Settings. Double check these.
  • Map out the user journey: know how users are likely to be impacted by your changes.
  • Test your goals: make sure you’ve seen some data against each goal from your QA work.
  • Know your baseline: check the initial results against traffic levels in your analytics tools.
  • Keep your team informed: don’t hog all the fun, and let others validate the results with you.
  • Check regularly: don’t go back to a lit firework; do go back to a live test…regularly.
  • Don’t be afraid to pause: pause your test if needed. It needs to be the best version it can be.

Introducing our hypothesis framework

Download printable versions of our hypothesis framework here.

Experiments are the building blocks of optimisation programmes. Each experiment will at minimum teach us more about the audience – what makes them more or less likely to convert – and will often drive a significant uplift on key metrics.

At the heart of each experiment is the hypothesis – the statement that the experiment is built around.

But hypotheses can range in quality. In fact, many wouldn’t even qualify as a hypothesis: eg “What if we removed the registration step from checkout”. That might be fine to get an idea across, but it’s going to underperform as a test hypothesis.

For us, an effective hypothesis is made up of eight key components. If it’s reduced to just one component showing what you’ll change (the “test concept”), you’ll not just weaken the potential impact of the test – you’ll undermine the entire testing programme.

That’s why we created our hypothesis framework. Based on almost 10 years’ experience in optimisation and testing, we’ve created a simple framework that’s applicable to any industry.’s hypothesis framework Hypothesis Framework

What makes this framework effective?

It’s a simple framework – but there are three factors that make it so effective.

  1. Putting data first. Quantitative and qualitative data is literally the first element in the framework. It focuses the optimiser on understanding why visitors aren’t converting, rather than brainstorming solutions and hoping there’ll be a problem to match.
  2. Separating lever and concept. This distinction is relatively rare – but for us, it’s crucial. A lever is the core theme for a test (eg “emphasising urgency”), whereas the concept is the application of that lever to a specific area (eg “showing the number of available rooms on the hotel page”). It’s important to make the distinction as it affects what happens after a test completes. If a test wins, you can apply the same lever to other areas, as well as testing bolder creative on the original area. If it loses, then it’s important to question whether the lever or the concept was at fault – ie did you run a lousy test, or were users just not affected by the lever after all?
  3. Validating success criteria upfront: The KPI and duration elements are crucial factors in any test, and are often the most overlooked. Many experiments fail by optimising for a KPI that’s not a priority – eg increasing add-to-baskets without increasing sales. Likewise the duration should not be an afterthought, but instead the result of statistical analysis on the current conversion rate, volume of traffic, and the minimum detectable uplift. All too often, a team will define, build and start an experiment, before realising that its likely duration will be several months.


Quant and qual data

What’s the data and insight that supports the test? This can come from a huge number of sources, like web analytics, sales data, form analysis, session replay, heatmapping, onsite surveys, offsite surveys, focus groups and usability tests. Eg “We know that 96% of visitors to the property results page don’t contact an agent. In usability tests, all users wanted to see the results on a map, rather than just as a list.”


What’s the core theme of the test, if distilled down to a simple phrase? Each lever can have multiple implementations or test concepts, so it’s important to distinguish between the lever and the concept. Eg a lever might be “emphasising urgency” or “simplifying the form”.


What’s the audience or segment that will be included in the test? Like with the area, make sure the audience has sufficient potential and traffic to merit being tested. Eg an audience may be “all visitors” or “returning visitors” or “desktop visitors”.


What’s the goal for the test? It’s important to prioritise the goals, as this will affect the KPIs. Eg the goal may be “increase orders” or “increase profit” or “increase new accounts”.

Test concept

What’s the implementation of the lever? This shows how you’re applying the lever in this test. Eg “adding a map of the local area that integrates with the search filters”.


What’s the flow, page or element that the test is focused on? You’ll need to make sure there’s sufficient potential in the area (ie that an increase will have a meaningful impact) as well as sufficient traffic too (ie that the test can be completed within a reasonable duration – see below). Eg the area may be “the header”, “the application form” or “the search results page”.


The KPI defines how we’ll measure the goal. Eg the KPI could be “the number of successful applications” or “the average profit per order”.


Finally, the duration is how long you expect the test to run. It’s important to calculate this in advance – then stick to it. Eg the duration may be “2 weeks”.

Taking this further

This hypothesis framework isn’t limited to A/B tests on your website – it can apply anywhere: to your advertising creative and channels, even to your SEO, product and pricing strategy.
Any change and any experience can be optimised – and to do that effectively requires a data-driven and controlled framework like this.

Don’t forget – you can download printable versions of the hypothesis framework here.

CRO is like poker

Conversion rate optimisation (CRO) and poker have a lot of similarities, and it’s more than just the opportunity to either make or lose a lot of money.


Anyone can play

Anyone can take a seat at a poker table and play a few hands. The game is relatively easy to pick up and there really isn’t any prerequisite knowledge needed apart from knowing how a deck of cards works.

The same can be said of CRO. There are plenty of tools out there that will allow you to start doing the basics of CRO in a couple of hours. Your free Google Analytics account can give you a pretty good understanding of where people are abandoning your site. Sign up for an Optimizely account and you can start running your first A/B tests as soon as you add the code to your pages.

The problem is, because it’s so easy to start doing something that feels like CRO, many companies think they’re doing CRO already so don’t seek help to do it better. Everyone starts playing with the assumption that they will win after all. But only the players willing to invest adequate time and even money into getting better will make consistent returns in the long run. That might mean reading up on the theory, looking at what others have done to be successful, or even getting professional help.  

Anyone can win the odd hand

The reason people get addicted to poker is that from time to time they probably will win a big hand and make some money. The problem is that over the long run the relatively infrequent big wins will be cancelled out by the all-too-frequent losses.

The same is true of CRO. Anyone can run a test and it’s within the realms of possibility that you might just get a winner too, maybe even a big one at that. We know from experience that small changes to sites can have big impact so you certainly can stumble upon these impactful changes.

If you want to be making a sustained impact on your conversion rate over time though, you’ll need a CRO strategy in place that can deliver these big wins on a regular basis.

Over time, a data-driven strategy will deliver better results

In poker a beginner’s luck will run out. It doesn’t matter too much what happens hand to hand, it matters what happens over the long-run – over hundreds of hands. A successful poker player adopts strategies that give them statistically better odds of winning. Over time, this statistical advantage is what means they are still there at the final table, with the biggest stack of chips. They may throw a few big plays here and there, but the majority of play is about being smart and using the data available to make good decisions consistently.

In CRO each split-test we run is like a hand of poker for the poker player. Being successful at CRO is not necessarily about getting a big uplift in one test, nor is it about being successful with every test you run. Being successful at CRO is about using the data you have available to you to devise testing strategies that deliver continuous improvement over time. There may be the odd test along the road that does deliver a 20, 30, 40% uplift in conversion rate.

The mark of a good CRO professional, however, is not getting that 40% winner, it’s what they do after that 40% winner to iterate on it and go further. It’s how they learn and adapt when a test doesn’t deliver an uplift to turn the data from that losing hand into a winning hand next time.

Finally, you play your opponent, not the cards

This is a well known mantra of poker and it stems from the fact that you have little control over what cards you’re dealt so can’t rely on good cards to win hands. Instead, by gathering data on your opponent such as their play style – how they play hands in which they win and how they play hands in which they lose for example – you can devise strategies to beat them no matter what hand you’ve been dealt.

This is true in CRO, although I wouldn’t suggest that you think of your potential customers as your opponents necessarily.

You might not have much control over the hand you’re dealt in terms of the product you’re selling or the service you’re offering. What you can control is how you use what you’ve been dealt, and it’s essential to understand how your visitors think so that you can decide how best to influence them using what you have. Likewise, there is only so much that web analytics data can tell you about why visitors are abandoning your checkout. You need to understand the motivations and thought processes of visitors at each stage of your funnel to know how to make them take the action you want.

CRO and poker have the same appeal. The simplicity of the objective – getting people to buy or getting people to fold. The potential for great returns if you’re successful. The thrill of getting that big uplift in a test or winning that big hand. Both CRO and poker though aren’t easy, and both need a lot of time and effort invested to do well.

There are a lot more unsuccessful poker players than successful ones as a result, and I think the same is probably true in CRO. Hopefully this post has given you a good idea of what can makes the difference.

Specialist teams or x-functional pods? A developer’s view is an agency comprised of specialists that will look for opportunities to improve client’s ROI through methodical research, testing and learning.  We analyze user behaviour and expectations of a website, in order to increase engagement levels and consequently, conversions.

Testing is at the heart of everything we do, so we’re always trying to improve and find better ways of doing things. Typically, our company is split into three major ’specialist teams’ – consultants, designers and developers.

Consultants: Their role is to perform in-depth research of a client’s website and get relevant insights about the business. Consequently, test ideas are generated and wireframes created. Also they are the main bridge between our clients and internal teams.

Designers: They feed into the wireframe stage by collaborating on ideas on how to implement the test concept. After approval on this stage they elaborate the final design file that will be transferred to the developers.

Developers: These geeks have the ability to transform the final design file into code readable by browsers. This is the final stage of the test creation flow.

After this internal process the test runs to a live audience through an A/B testing system, where at the end consultants analyse the final results and make recommendations for the client’s site.

Here is how the teams typically interact within the company:

As can be seen, developers come in at the very end of the process.  After designers have completed the final file they assign to one of the developers available at that moment. This is great from a developer’s standpoint, because they have the opportunity to work on many different clients and retain a good working knowledge across all of them. However the downside to this is that the work overload can be an issue. This happens because different consultants have different deadlines to deliver tests, so at times, congestion becomes unavoidable. Sometimes many tests come in to the development team simultaneously, and it is difficult to manage requests in order to deliver each test at the desired time.

Because of these issues, we had an idea to grab an element of each team and make them work more closely together. We have created a cross functional team a.k.a. pod.

What exactly is a pod?

A pod is like a small startup inside the company. Instead of organizing your business in separate functional departments, you create teams that contain a member of each function. Let’s illustrate what we have done within our company:

Graph 2

Clear goals and collaboration

With the team working collectively on the same clients, it’s much easier to sync up schedules. Since we always have a priority list for our tasks the team will work towards those goals by order. For example if a developer needs designer approval for a certain test, the designer will stop whatever they are doing to evaluate the developer’s work because that is the current priority for the whole team.

 Tidy schedule

Because there are clear goals, the project manager is able to build a clear schedule for everyone in the team. This helps the developer to know what work is coming soon to his stack. In this way, the developer can manage his time, along with his other number of tasks. This allows the developer to shift his projects the way he prefers as soon as he delivers his work on the expected deadlines.

Earlier technical evaluation

We have introduced a new format for the test idea/concept phase. Before the pod, the developer had little input at this stage. The developer is now an active member of the conceptual phase, bringing valuable know-how on potential implementation issues. Sometimes even a very slight different approach can save many hours of development and help the team deliver a certain test faster (for example – implementing native placeholders can cause cross browser compatibility problems. The developer might ask at this stage ‘is this really required for the test? Will this make a significant difference to conversions?’) Also, assimilating with the test at the very beginning can be good for the developer to research and develop some code practices that will be required to implement it (e.g. get familiar with new frameworks).

Faster test development

Since the developer has a clear pipeline he can start to develop the test before he actually receives the final design file from the designer. How is this possible? Well, before the designers start to work on the final photoshop file, there is a wireframe stage. As soon as we get approval from the client on the wireframe the developer can start to work at the same time as the designer prepares the final file. This is possible because the wireframe gives a clear indication of what the test is all about. With this visual info the developer is able to develop a big chunk of the HTML, CSS and javascript. Remember that from the test idea phase the developer already knows what functionality and goals the test is supposed to deliver. This allows the developer to finish around 70-80% of his work even before the designer delivers the file. With the final file developer just needs to make some tweaks on the code (e.g spacing, colours, etc.). So far, this new process has allowed us to deliver tests 35% faster than before.

Quick decision-making

Because the members are simply around each other, as opposed to working in silos, it is easier to take a minute to discuss something momentarily. Moreover, interrupting one of your team does not feel so intrusive because if you need something to finish the pod’s priority task, they are more open to being interrupted in order to collectively help meet the team’s goals.


Because the pod is like a small startup within the company, it allows the team to change processes and try new ways of working. This can be very useful in finding more efficient ways of working which we can then share with the other pods.


As optimizers, a testing culture is a vital part of how we work. This means we also need to measure everything and be able to critically evaluate how things are doing. Here are the results we have observed so far by moving from a specialist-teams to cross-functional pod approach:

  • 35% faster test delivery time from start to finish  By developing test ideas in parallel, as opposed to serially, we have seen a significant reduction in the total time lapsed from the inception of a test to the final launch.
  • 28% reduction in actual developer build time  By integrating the developer more closely in the design and consulting phases, devs have a much better idea of how to go about building the test at the point they start working on it, meaning the build time is dramatically reduced.
  • 66% reduction in bugs reported during QA  Consequently, developers are able to build tests more intelligently by anticipating any issues, and feeding in to the test development earlier on to avoid prospective clashes.
  • Happier team members  Although there are a few downsides to working in the pod, such as less variety of sites we get to work on, the individual members of the pod are generally much happier with this new approach, because they are working as a team throughout the whole process. This means fewer internal conflicts and more efficient workflows.
  • More time to work on other projects  Because we have increased efficiency across the board, pod members have more time to spend working on other tasks, such as internal assignments and creative projects. The introduction of a project manager also means that consultants spend more time doing valuable conversion-related work and less admin, which is likely to be correlated with the uplift in team happiness!

While it is still early days for the pod, the initial results and general consensus are a positive indication. As a developer, there are far fewer conflicts and less back-and-forth between the design and consulting teams, and we have become much more connected to the conversion aspect of what we do. The developer becomes more of an expert on a smaller number of clients’ sites (as opposed to a generalist working across the whole spectrum). Despite the small downsides – for example, if a pod developer is needed to work on a different client’s site they may initially be less familiar with the technical setup of the site; the surplus time the developer has as a result of working in the pod can be used for more internal sharing and learning which may be more valuable in the long term. The developer also has to adapt to many more meetings than they are typically used to (!) however the benefits of being more involved in the project overall makes it worth our while.

Do you have anything to add? Questions or comments? Let us know in the comments below!