On-site survey design: Collect voice of customer data like a pro

Brian Balfour, ex VP of growth at Hubspot, said “Math and Metrics Don’t Lie, But They Don’t Tell Us Everything”. I couldn’t agree more. While analytics tells us what happens on our website, qualitative data is crucial for understanding the why behind visitors’ decision-making. By knowing your customers’ pain points and reasons why they love your product, you can stop guessing and hoping to win the odd hand. Instead, you can start addressing your visitors’ real problems, and we are yet to find a better way to sustainably grow your business.

To make good decisions though, you need to nail both collection and analysis of your user data. Your conclusions have to actually reflect your website audience’s problems. We’re used to looking at statistical significance with our test results, but when we’re gathering qualitative feedback, how do we know when we have enough data to draw a meaningful conclusion? The reality is that making sure that your data brings powerful insights is both an art and a science. Today I will explain strategies conversion champions use when analyzing qualitative open-ended data.

Introduction
What are on-site surveys anyways and why should you use them?
How many responses do I need?
Sample size. Does it even matter?
What is thematic saturation?
Minimum number of responses
Don’t rely on magic numbers, but look for saturation
Validate with a follow-up survey
Run another open-ended survey to examine a particular theme in more depth
Triangulate
Be pragmatic, not perfect
Key Takeaways

What are on-site surveys anyways and why should you use them?
On-site surveys are a great way to gather qualitative feedback from your customers.

In this article when I refer to on-site surveys, I mean small pop-ups that prompt a visitor to answer a certain question(s). Qualaroo and Hotjar are our favourite data collection tools.

In contrast to other methods of qualitative research, on-site surveys can be:
- Non-intrusive (they don’t significantly distract visitors from engaging with the website).
- Anonymous, allowing for higher “ecological” validity of responses. This means that customers tell you what they actually think without trying to conform to your expectations (which may happen in interviews).
- Don’t require extensive prior experience (as compared with something like interviews).
- Immediate. In comparison to panels & interviews, you can start collecting data instantly.
- Contextual. They can provide insights about your customer’s state of mind at a particular stage in your conversion funnel. This allows you to optimize for relevance!
How many responses do I need?

Often when companies run surveys, they aren’t sure how long to run them for. They may ask themselves: “What is the required sample size? Am I better off running a survey for a little bit longer? What % of my website audience should respond for the survey to be representative of their needs?”

I was asking these questions, too. When I studied for Avinash Kaushik’s web analytics certification, he suggested 5% of your overall traffic. At the time, I was looking at running surveys for some smaller websites and Avinash’s rule was applicable to only very large websites, so I could not use it.

Then, Peep Laja suggested having at least 100-200 responses as a minimum. I was not sure if I could apply this to any context though. Are 100 responses going to be as useful for a website with 10,000 monthly visitors as for a website with 1,000,000 daily visitors?

Sample size. Does it even matter?
The reality is that it depends, but most importantly you might be looking at it the wrong way. The primary factor we use in determining the number of required responses is the goal of the survey. At Conversion.com, we primarily use them for the following 2 goals:
1. Understanding the diversity of factors affecting user behavior (i.e. what factors motivate or stop visitors from taking a desired action)
2. Ranking and prioritizing these factors (in order to prioritize testing ideas)
The first goal is crucial at the start of every conversion optimization program (and this is the goal we will dive into in this article; for the other goal keep an eye on our future articles).

When pursuing this goal, we are trying to understand the diversity of factors that affect user behavior, and our purpose is not necessarily to make estimations about our website’s audience as a whole.

For example, we are not trying to answer the question of how many people like your product because of reason A or reason B, but we are just curious to understand what are the potential reasons why people like it.

We are more interested in gaining an in-depth understanding of people’s diverse subjective experiences and making meaning out of their responses, even if we are not sure if we can generalize these findings to the website’s audience as a whole. As Stephen Pavlovich puts it: “At this early stage, we’re not looking for a statistically valid breakdown of responses – we’re essentially looking for ideas and inspiration.”

This means that with on-site surveys that pursue goal #1, standard criteria for evaluating quality of your findings such as validity and reliability (think of confidence intervals and margins of error) are not applicable. Instead, you should use thematic saturation.
What is thematic saturation?
When analyzing raw data, we categorise responses into themes. Themes are patterns in the data that describe a particular reason for taking or not taking a certain action (or any other factors we are interested in understanding). In simple terms, thematic saturation is when new responses do not bring significant new information, i.e. you start seeing repetition in visitors’ responses and no new themes emerge.

In the context of conversion optimization, this means asking yourself 3 questions:
1. Have I accurately interpreted and grouped the raw data into themes? i.e. have I identified the customers’ real pain points and motivations for taking a certain action?
2. Do the responses that I assigned to each of the themes fully explain that theme? (or is there diversity that I have not fully accounted for, i.e. are there any important sub-themes?)
3. Do the new responses that I have gathered bring new, actionable insights to the table?
If you can answer “Yes”, “Yes” and “No” to the questions above, you are likely to have reached saturation and can stop the survey.

Example:

As you can see in this example, the newest responses did not bring any new surprises. They fell under existing themes. As there was no more diversity in the data, we stopped the survey.

NB: Note how one simple concept of convenience can have several dimensions in your customers’ minds. This is why question 2 is so important. By understanding the differences in the way customers perceive your product’s benefits, you can now design a more effective value proposition!

Indeed, the answers to these questions are subjective and require experience. This is not because the method is ‘bad’, but because we are trying to explain human behavior and there will always be a degree of subjectivity involved. Don’t be too hard pressed by your quantitative colleagues – some of the most important breakthroughs in history were based on studies with a sample size of 1. Did you know that Freud’s revolutionary theory of psychoanalysis originally started with examination of fewer than 10 client cases?
Minimum number of responses

Does this then mean that you can get away with as few as 10 responses? In theory yes, as long as you gain an in-depth understanding of your customers. It is a common practise in traditional research to set minimum requirements on the number of responses required before you start examining whether your data is saturated.

As a general rule, the team at Conversion.com looks for a minimum number of 200 responses. So does Andre Morys from Web Arts. Peep Laja from ConversionXL responded that he currently uses 200-250 as a minimum. Other professionals, including Craig Sullivan and Brian Massey say that they don’t use a minimum at all. The truth is you can use a minimum number as a guide, but ultimately it’s not the number that matters, but whether you understood diverse problems that your customers have or not.

When using minimums: Don’t think responses in general, remove all the garbage

In one survey we ran, 35% of responses appeared to be unusable, ranging from responses like “your mum” to random strikes of digits on a keyboard. When assessing if you passed the minimum threshold, don’t just look at the number of responses your survey tool has gathered, but look at the number of usable “non-garbage” responses.
Don’t rely on magic numbers, but look for saturation
As I have already said don’t rely solely on best practises, but always look for saturation. You need to realise that each website is unique and your ability to reach saturation depends on a number of criteria, including:
- Your interpretative skills as a researcher (how quickly can you derive meaning from your visitors’ responses?), which in turn depends on your existing knowledge about customers and your familiarity with the industry. So, you are better off gathering more responses as long as they can help you to accurately interpret your audience’s responses.
- Have you asked the right questions in the first place? It is difficult to derive meaningful insights unless you are asking meaningful questions (if you don’t know what questions to ask, check out this article).
- Homogeneity/Heterogeneity of your audience. If your business is very niche and much of your audience shares similar characteristics, then you might be able to see strong patterns right from the start. This is less likely for a website with a very diverse audience.
How do I know if the 189th response won’t bring any new perspectives on the issues I am investigating?

The truth is you never know, in particular because every person is unique, but there are strategies we use to check our findings for saturation.
Strategy #1: Validate with a follow-up survey
This strategy has three steps:
1. Run an open-ended survey (survey 1, above)
2. Identify several themes
3. Re-run the survey in a multiple choice format to validate if the themes you identified were accurate (survey 2, above)
The first two steps is what you would normally do and you might not get an incredibly high response rate because writing proper feedback is time-consuming. The third step compensates for it though as instead of running an open-ended survey, you run it in the format of multiple choices. The key here is to include an “Other” choice option and ask for an open-ended response in case this option was chosen. This way you can ‘fail safe’ yourself by examining if people tend to choose the “Other” option.

When is it best to use this approach? It’s particularly useful on smaller websites due to low response rates.

Brent Bannon, PhD, ex growth manager at Facebook and founder of LearnRig, suggests that there is another critical reason why you should use close-ended questions as a follow-up.
1. item non-response [i.e. where a user skips or doesn’t provide a meaningful answer to a question] is much higher for open-ended questions than for closed-ended ones and people who respond to the open-ended question may differ systematically from your target population, so this response set will likely be more representative.
2. open-ended questions tend to solicit what is top-of-mind even more so than closed-ended questions so you don’t always get the most reasoned responses – this is pretty heavily influenced by memory processes (e.g. frequency and recency of exposure). Using a list of plausible motivations may get you more reliable data if you’re confident you’re not missing important/widespread motivations.
Brent Bannon

Founder of LearnRig
So, be cautious if you are asking people about something that happened a long time in the past.
Strategy #2: Run another open-ended survey to examine a particular theme in more depth
This strategy has three steps:
1. Run an open-ended survey (survey 1, above)
2. Identify several themes
3. Run another open-ended survey to examine a particular theme in more depth (survey 2, above)
Sometimes the responses you get might show you that there is a recurring theme, for example there is a problem with trust. However, respondents provide very limited detail about the problem, so although you identified a theme, you have not fully understood what the problem really is (saturation was not reached!). In that case, we would develop another open-ended survey to examine that particular theme because we know that additional responses can yield extra insights and explain the problem in more depth.

Craig Sullivan from Optimal Visit elaborates on that:

The trick with this work is to accept that the questions you ask may not be right first time. When I first started out, my mentor made me run surveys where it was clear that I’d asked the wrong question or not found the real answer. He kept tearing them apart until I’d learned to build them better and to iterate them. Asking good questions is a great start but these will always uncover more questions or need clarification. Good exploratory research involves uncovering more questions or solidifying the evidence you have.

It’s like shining a light in a circle – the more the area is lit, the more darkness (ignorance) you are in touch with. What you end up with is a better quality of ignorance – because NOW you actually know more precisely what you DO and DON’T know about your customers. That’s why iteration of research and AB testing is so vital – because you rarely end at a complete place of total knowledge.

Craig Sullivan

Founder of Optimal Visit

When is it best to use this approach? Whenever you have not fully explored a certain theme in sufficient depth and believe that it can lead to actionable insights.

Note: Be cautious if you’re thinking of doing this type of investigation on a theme of “price”. Self-interest bias can kick in and as Stephen Pavlovich puts it “It’s hard to rationalise your response to price. This is one instance where it’s preferable to test it rather than run a survey and then test it.”
Strategy #3: Triangulate
Triangulation is when you cross-check your findings from one method/source with findings from another method/source (full definition here).

For example, when working with a major London airport we cross-checked our findings from on-site surveys with real-life interviews of their customers (two different methods: surveys and interviews; two different sources: online and offline customers). This ensured a high level of understanding of what customers’ problems actually were. Interviews allowed flexibility to go in-depth, whilst surveys showed a broader picture.

Triangulation allows you to ensure you have correctly interpreted responses from your customers, and identified their real barriers and motivations, not some non-existent problems you thought your customers might have. Interviews can provide you with more detailed and full explanations; this in turn would allow you to make more accurate interpretation of your survey results. There is strong support in academic research for using triangulation to enhance understanding of certain phenomenon under investigation.

When best to use it? Always. Cross-checking your survey findings with more in-depth data collection methods such as live chat conversations or interviews is always advisable as it provides you with more useful context to interpret your survey results.

Brian Massey from Conversion Sciences also emphasises the importance of cross-checking your data with analytics:
Onsite surveys have two roles in website optimization.
1. Answer a specific question, to support or eliminate a specific hypothesis.
2. Generate new hypotheses that don’t flow from the analytics.
In both cases, we want to corroborate the results with analytics. Self-reported survey data is skewed and often inaccurate. If our survey respondents report that search is important, yet we see that few visitors are searching, we may disregard these results. Behavioral data is more reliable than self-reported information.

Brian Massey

Co-founder of Conversion Sciences
Be pragmatic, not perfect

Finally, we need to be realistic that it is not just the overall quality of our findings that matters, but time and opportunity cost required to get them.

That’s why it can be useful to decide on a stopping rule for yourself. Stopping rules could look like these: “After I get 10 more responses and no new themes emerge, I will stop the survey” or “I will run the survey for 2 more days and if no new themes emerge, I will stop it”.

After you pass the minimum threshold and you are sure that you correctly interpreted at least some of the real issues, you might be better off testing rather than perfecting your data.

Remember, conversion optimization is a cyclical process: we use qualitative data to inform our testing, and then we use the results from our tests to inform our next survey.
Key Takeaways
- Use on-site surveys to understand your users’ barriers and motivations for taking a certain action at a particular stage in your conversion funnel
- Thematic saturation should be your main quality criteria, not sample size, when trying to understand the diversity of factors that affect your visitors’ decision-making. But if you’re not sure or want to estimate beforehand, 200 responses is a good general rule (when applied to “non-garbage” responses).
- You can examine if you managed to reach saturation:
  - By running a follow-up survey in a multiple-choice format and examining if people tend to choose “Other” as an option
  - By running a follow-up survey in an open-ended format to better understand a particular theme (if there is ambiguity in the original data)
  - By cross-checking your survey findings with other data sources/collection methods
- Remember, that results from tests that are backed up by data is the best source of learning about your customers. Take your initial findings with caution and learn from how your users behave, not how they tell you they behave.