• As experimenters, we often overlook the distinction between a hypothesis and its execution. A hypothesis represents a theory we aim to validate, while an execution is how we specifically plan to validate it. It’s conceivable to possess a robust hypothesis supported by evidence and yet execute it poorly.

    This raises a question:

    What measures can we take to ensure our execution effectively validates our hypothesis?

    The ALARM protocol is not just another tool in the world of experimentation. It’s a comprehensive framework that allows you to scrutinize your experiment concepts from every angle, helping you test your hypotheses as effectively as possible.

    In this post, we’ll offer an overview of our ALARM protocol before showing how to apply it to your experiment concepts to ensure they are as strong as possible.

    Contents
    What is the ALARM Protocol
    Applying the ALARM Protocol

  • Contents

  • What is the ALARM Protocol?

    Broadly speaking, an experiment will lose for one of two reasons.

    1. The hypothesis is incorrect – e.g. adding social proof to your product page was not an effective way of increasing trust.
    2. The execution is poor – e.g. maybe social proof was effective, but the specific way you executed your hypothesis was the problem.

    The ALARM protocol is a framework developed by our team of 20+ experimentation consultants to tackle the second item on this list: poor executions.

    By passing experiment concepts through our ALARM protocol, we can ensure that our executions are as strong as possible before they’re eventually built. This means that when we decide to invest time into an experiment, we can be confident that it is the most effective execution possible for that specific hypothesis.

    Generally speaking, we produce the best experiments by questioning ourselves. We are undoubtedly missing an opportunity if we always do the first thing we think of. The ALARM protocol is here to guide us and ensure our success.

    ALARM Protocol

    The ALARM Protocol

    ALARM – an acronym for Alternative executions, Loss factors, Audience and Area, Rigour, and MDE & MVE – is a structured approach to concept evaluation. Each component plays a crucial role in the process, and understanding them all individually is key to effectively applying the ALARM protocol.

  • Applying the ALARM Protocol

    A: What ALTERNATIVE EXECUTIONS could be used to test this hypothesis? Why didn’t we pick those?

    The first step in the ALARM protocol prompts us to consider alternative execution strategies that we might be able to use to test our hypothesis better. By exploring different approaches, we can uncover hidden opportunities and mitigate potential risks associated with our chosen approach.

    Example

    Hypothesis Digital Asset

    To understand the importance of considering alternative executions, take a look at the following example from our own work:

    For the first experiment in this sequence, we added a comparison modal to the product category page so that users could compare the features of different products. We’d done our research, and we were confident in the concept, but when we ran the experiment, the result was flat.

    We wanted to understand the why behind this result, so we dug into our data, and here’s what we found:

    • Due to its low visibility—the modal only appeared after at least two products had been selected—only a very small proportion of users engaged with the new feature.
    • Users that engaged with the modal actually did have a significantly higher conversion rate.

    Given that users who engaged with the modal were, in fact, converting more reliably, we hypothesised that increasing the visibility of the modal would increase its conversion rate.

    Therefore, in an iteration of the experiment, we chose to enhance the prominence of the comparison feature. Although the design remained the same as in the initial experiment, the modal was shown by default once the user landed on the category page. This meant the modal was visible to all users, regardless of whether or not they had selected to compare items. This change increased the visibility of the feature and ultimately resulted in an 11.7% improvement in revenue.

    The above example underscores the crucial importance of exploring a wide range of executions. If we had applied the ALARM protocol, would we have questioned whether this execution was bold enough earlier and got winning results sooner?

    L: Write down at least four reasons that the test might LOSE. Should we adapt the execution to mitigate those risks? If not, why not?

    Once we’ve considered potential alternative executions, we need to identify potential reasons for a concept’s failure and proactively mitigate these risks. By understanding and addressing potential pitfalls upfront, we can increase the likelihood of success.

    Example

    Hypothesis Example

    In the above experiment, we identified two key areas that may have caused a loss during this experiment.

    Firstly, the user could question whether the rating is based on a reliable number of people. We can and should mitigate this risk. When considering the mitigable risk, we propose that disclosing the number of reviews contributing to the score will increase confidence that these reviews are from real customers.

    Secondly, the user might find Trustpilot more credible than Feefo, as it is a more prominent and well-known review site in the UK. However, we can’t risk displaying the Trustpilot score as it is too low and may, therefore, have the opposite of the intended effect. Although a smaller and lesser-known review site, the Feefo score is much better, with enough customer reviews mitigating the first loss factor. This factor cannot be mitigated—we are taking a considered risk to learn.

    A: Is there a better choice of AUDIENCE & AREA to maximize our chance of a winner?

    The next step in the ALARM protocol asks us to consider whether we would have a higher chance of success if we tested the concept on a different site page or to a wider audience.

    Is there a risk that the change is too early or too late in the journey? Will the execution shrink your audience? For example, if users are only exposed to the experiment change when clicking on a tooltip or scrolling down the page, you are shrinking your audience as not everyone will see the change.

    Example

    Examining the impact of the selected area on experiment results is crucial. Typically, the chosen area directly influences the kind of audience that will be interacting with the experiment.

    In the experiment above, which was conducted on a vehicle rental website, we tested placing step-by-step instructions for the booking process on the homepage, the earliest point in the user’s journey. This test did not lead to a significant uplift, but while this outcome was unfavorable, it allowed us to learn and iterate.

    We conducted a second test implementing the step-by-step instructions for the booking process on the location page, which yielded positive results despite occurring later in the user’s journey. This test shows the value of questioning the page you choose upfront and thinking about this more carefully. Exploring multiple areas is vital in producing results with the highest impact.

    The iteration process above – prompted by using the ALARM protocol – holds significance, especially when the initial experiment did not provide any significant results. Using the protocol, instead of prematurely deeming the experiment a failure, we considered an alternative area that could be valuable. This approach determines the next course of action: if successful in a new location, it wins; if unsuccessful, it prompts consideration of an alternative approach.

    R: Have we taken at least two actions of RIGOR to ensure the execution is as good as possible?

    Here at Conversion, we have several predefined methods to ensure our execution plans are robust and well-thought-out. For example, maybe we can use our experiment repository to see if we can use learnings from previous experiments on similar websites to inspire our concept. Or maybe we explore our library of psychological principles to see if one of them can be applied to our experiment.

    This step in the ALARM protocol is where we apply at least two of these actions of rigor to our concept. By conducting thorough research, gathering data-driven insights, and finding supporting psychological principles, we can refine our concepts and maximize their potential for success.

     

    Example

    When looking at rigor, one impactful way to corroborate your experiment is to look at supporting psychological principles. Two examples of psychological principles we often use within our experiments include:

    • Social Proof: Including the number of customer reviews may lower the risk in the customer’s eyes. Although this doesn’t fully mitigate the perceived risk, it does reinforce that other customers have chosen and had a positive experience with this company.
    • Picture Superiority Effect: Using an image or icon, like the stars shown above, alongside the review count makes it easier for the customer to perceive the positive reviews. Removing the need to read and use stars is widely recognized as a sign of trustworthiness.

    M: Is your concept bold enough to hit your Minimum Detectable Effect (MDE)?:

    The ALARM protocol’s final step is evaluating the proposed concept based on whether or not it is likely to be bold enough to hit our minimum detectable effect (MDE). For us here at Conversion, this is a slightly more nuanced issue than you might think:

    Our experiment database shows that experiments with a small build size are just as likely to win as those with a large build size. Our philosophy, therefore, is to attempt to validate our hypotheses with the smallest experiments possible, i.e. the minimum viable experiment (MVE).

    The balance we need to strike here is to ensure that our experiment is small enough to validate our hypothesis with minimum effort while being bold enough to hit our MDE. Generally speaking, there are usually ways to increase the experiment’s boldness without necessarily increasing the build size.

    For instance, imagine an experiment where you want to add reassurance messages like “You are free to cancel anytime.” A less bold approach might overlook how to make this content ‘pop’ on the page. A bolder strategy, however, could simply involve placing the content more prominently or integrating it into a site-wide banner for increased visibility across multiple pages.

    Example

    Remember, for every idea you have, you should ask what the smallest thing you can test is that proves your hypothesis could be correct.

    In the above experiment, we were looking to optimize the value statement lever. To do this, we simply adjusted the copy of the 3-for-2 roundel, which resulted in a +3.55% uplift in transactions. This experiment was an MVE – it involved a simple copy change – but it was also bold enough to hit our minimum detectable effect – the roundel was displayed prominently across the site, where lots of people would see it.

  • A Journey Towards Excellence

    As shown above, the ALARM protocol should guide the depth of execution in an experiment, ensuring a thorough understanding of the data, lever, hypothesis, and execution strategy. By following these steps, we can navigate potential risks and optimize our approach for success.

    In the examples provided, we identified areas of potential issues and proposed mitigation strategies, highlighting the importance of addressing risks where possible and accepting calculated risks to facilitate learning.

    Integrating the ALARM protocol ensures that every concept is rigorously evaluated before proceeding. It is a structured framework for fostering innovation and ensuring that our concepts have the best possible chance of success.

    If you’re curious about how this works or have any questions, please contact us! We love talking about experimentation, and we’re always eager to share what we know!