• Here at Conversion, we’re always looking for ways to create an unfair competitive advantage for our clients. Nothing allows us to do this more effectively than our Experiment Repository.

    As far as we know, our Experiment Repository is the largest, most robustly tagged collection of experiment data in the world. By putting this one-of-a-kind resource at our clients’ disposal, we’re able to give each of them a sizable, one-of-a-kind edge over their competition.

    In this post, we’re going to start out by introducing our Experiment Repository, before then walking through some of the most impactful ways we’re using the repository to generate previously untapped value for our clients.

    Insofar as business experimentation is concerned, experiment repositories are still a relatively underexplored commodity. Our hope is that by sharing some of the techniques we’ve developed to get the most out of our own repository, we’ll inspire and empower others to do the same with theirs.

    So, without further ado…

    Note: for those teams looking to build an experiment repository from scratch, we have another piece of content that will serve your interests better than this one (click here). The piece you’re reading now can be thought of as part two in series: how to use your repository – once it’s already built – to drive results.

  • Contents

  • Our Experiment Repository

    Since opening our doors 15+ years ago, we’ve stored and tagged every experiment we’ve ever run.

    As the world’s largest experimentation agency, this means we now have a database of more than 20,000+ experiments, which have been run across countless websites, industries, verticals, company sizes, maturity levels, etc.

    What’s more, throughout this period of time, we’ve dedicated huge amounts of energy to developing some of the most advanced taxonomies our industry has to offer. These taxonomies allow us to slice up our data in novel ways to unearth patterns that would otherwise have remained buried beneath the noise.

    As a result of all this effort, we now have what we believe to be the largest (in terms of size and breadth), most operational experiment repository on the planet.

    High-level view of our experiment repository

    This puts us in an extremely unique position.

    We have access to data that no other experimentation team has access to. Working out what to do with all of this – how to turn it to our clients’ advantage – has been a unique and ongoing challenge.

    After many years of trial, error, and iteration, we’ve now arrived at some extremely well-validated techniques for exploiting this resource in our clients’ favor.

    In fact, our Experiment Repository has become the single greatest source of value that we’re able to offer our clients – and as we trial increasingly innovative techniques and technologies, the value of the repository is only growing with each passing year.

    Throughout the remainder of this piece, we’re going to share some of the most effective techniques we’ve come up with – so far – with you.

  • How we use the repository to give our clients’ a competitive edge: 8 use cases

    1. Database research

    Most experimentation teams are limited to the same set of research methodologies – things like analytics, surveys, user research, etc.

    While we ourselves still use these kinds of methodologies extensively, we also have access to a completely novel research methodology of our own:

    Database research.

    Database research involves querying our experiment repository to unearth macro-trends that we use to develop better hypotheses and produce more successful experiments.

    In conjunction with data from other methodologies, the value of this information can’t really be overstated. For example,

    • Analytics may tell you where users are dropping out of your funnel.
    • User testing may tell you why they are dropping out of your funnel.
    • Database research tells you how past clients solved this problem – and therefore how you might be able to do so too.

    It allows us to enrich all of our research data with powerful contextual information, and it ultimately means that we’re able to drive results more quickly and more consistently than we would otherwise have been able to.

    To give an example:

    Imagine that we’ve just started working with a new client. Without the database, we would be forced to proceed from a standing start, blindly testing the waters with early experiments and then trying to refine and iterate as we go.

    But with our database, we’re able to take insights from past experiment programs and apply them directly to our new clients’ programs. For example:

    • Which kinds of levers tend to be most – and least – effective within this industry, vertical, and company size?
    • Which kinds of psychological principles tend to be most – and least – powerful for users of this kind of website?
    • Whereabouts on these kinds of websites – e.g. on product pages or the basket – do tests tend to be most impactful?
    • What design patterns tend to perform best in different situations?
      Etc.

    By using the database to answer these kinds of questions, we can get a head start and begin driving results from day one of the program.

    Moreover, the value of the database isn’t just limited to the start of a program. We’re also able to use insights from our database to solve the problems of mature programs too.

    Consider this example:

    Through survey responses, we’d discovered that one of our financial services clients had a trust problem: their website visitors did not know who they (our client) were and therefore did not find them to be a credible brand.

    Adding social proof in the form of reviews, ratings, and testimonials is often the first port of call for addressing issues of this kind, but before going with this tack, we decided to run some database research.

    We queried our repository to discover how past clients in the same industry had solved problems tied to the Trust lever. Interestingly, we discovered that Social Proof was actually an extremely ineffective lever in this specific niche. Financial services clients do not generally want to hear about how other clients have benefited from the service; instead, they want to feel that the service is highly exclusive and that it can offer them a unique edge.

    As one example of many, social proof had a profoundly negative effect when added to Motley Fool’s email capture modal.

    The Authority lever, on the other hand, which involves appealing to credible institutions and authority figures in an industry, tends to be much more effective in this particular niche – so we deployed this lever and were able to achieve a string of strong winners for the client

    In this instance, our repository allowed us to avoid a potentially blind alley of testing and meant we were able to apply a highly effective solution to the problem on our first attempt.

    2. Sharpen experiment executions

    A good experiment concept is made up of two parts:

    • The hypothesis – a prediction you want to validate
    • The execution – how you intend to validate it

    It’s possible to have an extremely strong, data-backed hypothesis but a poor execution. If this is the case, you may find that your test loses even though your hypothesis was actually correct.

    This is a real problem.

    Declaring a test a loser when its hypothesis was correct can result in vast sums of money being left on the table.

    Thankfully, this is a danger that our experiment repository is helping us mitigate.

    When developing the executions for our hypotheses, we subject our concepts to several ‘actions of rigor’ to ensure that they’re as strong and as thought-out as they possibly can be.

    Database research offers one such action of rigor. In this context, database research involves querying our repository in various different ways to understand how we might make our execution more effective.

    Consider this example:

    One of our clients had a single-page free-trial funnel.

    Various different research methodologies had suggested that a multi-step funnel would be more effective, since it would elicit completion bias and thereby motivate the user to progress through the funnel.

    As part of this test, we knew we were going to need to design a new progress bar. To make this new progress bar as effective as possible, we filtered our repository by ‘component’ and ‘industry’ so that we could find all of our past tests that involved progress bars from the same or adjacent verticals.

    When we did this, a clear pattern emerged: low-detail progress bars have a much higher win-rate than high-detail progress bars.

    Our meta-components study showed that on the aggregate, low-detail progression bars perform better than high-detail progression bars

    We were able to use this insight to build a new, low-detail progress bar for the test – and the experiment ultimately resulted in a 7.5% increase in signups.

    Newly designed funnel with simplified progression bar delivers a CR uplift of 7.5%

    This is just one example of the many ways we’ve been able to use our repository to sharpen up our executions and generate as much impact for our clients as possible.

    3. Machine-learning assisted prioritization

    Every experimentation team has more test ideas than they can ever conceivably run. This creates the need for prioritizing some ideas – and deprioritizing others.

    Over the years, many serviceable prioritization tools have emerged, but they all fall short in at least one of two ways:

    1. Subjectivity – they rely too heavily on gut-feel and too little on cold hard data.
    2. One size fits all – they judge every test by the same set of criteria when criteria should be dynamic, based on the unique context of each test.

    To solve this problem, we decided to train an advanced machine learning model on all of the data in our database.

    Though only the first iteration, this model – dubbed Confidence AI – is now able to predict the results of winning a/b tests with ~63% accuracy. Based on standard industry win rates, this makes the model several times better at predicting a/b test results than the average practitioner.

    Confidence AI computes a confidence score for each experiment concept

    By embedding this tool into each of our client’s Experimentation Operating Systems, we’re able to use Confidence AI to dynamically prioritize our test ideas when new results and insights come in.

    Ultimately, this means we can zero in on winning test ideas far more quickly, while deprioritizing avenues of testing that appear – based on the model – to be less potentially fruitful.

    There are various nuances and niceties to the way we use Confidence AI in our work. If you’d like to learn more, click here.

    4. Optimizing our own methodology

    Our proprietary methodology here at Conversion is one of the primary reasons that leading brands like Microsoft, Whirlpool, and Adblock have chosen to work with us.

    In fact, when we apply for awards or enter into competitive pitches, our methodology invariably achieves the highest score possible.

    One of the reasons for this is our repository:

    Our repository allows us to gain a bird’s-eye view on what’s working with our methodology and what’s not. Over time, we’ve been able to use this data to question our assumptions, invalidate company lore, and refine how we approach experimentation.

    To give one (of many) examples:

    Many people in our industry hold the assumption that ‘ the bigger the build, the bigger the uplift.’ On its face, this makes sense: experiments with bigger builds are generally assumed to involve more ambitious ideas; ambitious ideas have the potential to move the needle in a big way.

    But here at Conversion, we believe in using data to put hypotheses to the test – so that’s what we did.

    We used our database to compare the relationship between build size (in terms of dev time) and win-rate/uplift.

    To our surprise, we found that tweaks had the same average win-rate and uplift as large tests.

    Contrary to industry lore, data from our repository shows that experiments with longer built times actually perform worse than those with shorter build times

    This insight has tremendous practical significance. After all, if small tests are just as likely to win as large ones, why waste resources building larger tests?

    Since running this analysis, we now seek to validate our hypotheses using the smallest experiment possible – or what we call a Minimum Viable Experiment (MVE). This approach allows us to gather data at speed, and ensures that when we do decide to invest in a big, resource expensive experiment, its odds of winning are significantly higher.

    Our MVE-centered approach

    By running analyses of this kind and using them to challenge our assumptions, we’ve been able to develop a highly novel approach to experimentation that delivers outstanding results time after time.

    5. Unearth macro-trends within a program

    The points we’ve been talking about so far have mainly been focussed on the way we use our agency-wide repository to generate value for our clients – but we also create individual repositories for each of our clients too.

    The final three points in this post relate primarily to these individual client repositories.

    All of the data from within our main repository is filtered off into individual client repositories that contain only the tests and research insights that we’ve unearthed for each specific client.

    One of the main advantages of these client repositories is that they allow us to cut up a client’s data and unearth patterns and trends that have accumulated throughout the course of our work together.

    This kind of analysis can be incredibly powerful.

    With one client, for example, we’d been running hundreds of tests per year for a couple of years. This meant they had several hundred tests in their repository.

    When we ran our analysis, we found that some sub levers performed extremely well, some showed promise – and some performed poorly.

    We sorted each Lever into an Exploit, Explore, and Abandon category

    We split these sub-levers out into three groups:

    • Exploit – this has been extremely effective in the past; let’s do more of it
    • Explore – this looks promising but we don’t have enough data to be sure; let’s test the waters and see what comes up
    • Abandon – we’ve run lots of tests on this but it doesn’t seem to be effective on this website

    What’s more, our analysis also revealed that many of our client’s wins and biggest uplifts tended to cluster on specific areas of the website.

    Here’s an example of what this kind of analysis might look like.

    With this analysis in hand, we were then able to build out our client’s experiment roadmap, focussing the majority of our tests on the levers and areas of the website that our analysis revealed to be particularly promising.

    Ultimately, the following quarter’s results turned out to be the most impressive results this client had ever seen. In fact, in large part thanks to this analysis, we were able to hit our annual revenue target – which was itself the biggest target we had ever been set – by the end of March, i.e. with 9 months to spare.

    To read the full story, click here.

    6. Wider impact

    Experimentation isn’t just about driving winning tests or unearthing insights – it’s about using experiment results to inform key decisions. What use is a game-changing insight unearthed through experimentation if nobody in the organization hears about it?

    Unfortunately, many experimentation teams struggle with this final piece of this equation. Generating winners and unearthing insights is one thing; finding ways to make those results percolate through an organization, so that they can inform important decisions, is another.

    By providing a centralized database of insights that anyone in our client’s organization can access and filter according to their needs, our client repositories do a tremendous job of increasing the ‘impact radius’ of our programs. They grant all teams – from design, product, engineering, and even the c-suite – ready access to our findings, which means these findings have a greater chance of informing decisions across the entire organization.

    To give a recent example:

    One of our clients had brought in an external design agency to redesign large portions of its website. By providing this design agency with access to our experiment repository, they were able to use past experiment findings and program-wide patterns to steer away from unsuccessful design patterns – and towards successful ones.

    From a conversion standpoint, this ultimately meant that the newly designed pages were much more effective than they probably would have been had the design agency operated without access to our insights.

    7. Gain buy-in for your program

    Tying in with the point above, one of the key advantages of having a centralised experiment repository is that it becomes a lot easier to monitor program-wide metrics, including win-rate, revenue, and ROI.

    This is obviously extremely important.

    Money speaks.

    If you can easily point to the bottom line impact of your program, it becomes much easier to evidence the program’s value, fight for budget, and grow your program.

    Our experiment repositories provide our clients with an easily accessible dashboard that updates as soon as new experiment results come in, providing an up to the minute account of the program’s progress and success.

    This ROI tracking functionality has proven to be extremely important for many of our clients, allowing any of them to easily demonstrate ROI and ultimately generate leadership buy-in and enthusiasm for the program.