Split testing when you shouldn’t.
This is the subject of many of my rants and diary entries…
Yes, I have a diary. Don’t judge me. 😉
More than likely you’re going to run into an instance where you’re asked to split test — but really you shouldn’t be. Sometimes it looks like an impossible task because the page simply don’t have enough traffic.
Today I am sharing 8 methods I use when I need to split test on low traffic pages.
If you follow these strategies you can collect decent data (and keep from being the subject of my next diary entry).
But before we look at the methods for low traffic optimization, we need to answer this question…
How much traffic is enough traffic to test?
Actually… it’s less about the traffic and more about the number of converting actions.
Let me explain…
Conventional split testing wisdom is to disqualify pages that don’t hit the absolute minimum of 100 converting actions per variation.
Why 100 converting actions?
This gives you a baseline to work with that is attainable for low traffic sites. Over the last few years I’ve been seeing this number inflate more, and yes, sometimes you will need 200-300 converting actions to make a major site-wide decision.
However, for our purposes, 100 converting actions should do just fine.
Notice I am saying ‘converting actions’ — this is important because YOU are the one who defines what counts as a conversion.
Are you measuring…
Top funnel metrics like:
- Engagement stats (click, time on site, bounce rate)
Mid funnel metrics like:
- Leads generated
Bottom funnel metrics like:
- Revenue per visitor (RPV)
You decide. We’ll talk a bit more about the metric hierarchy in the next section.
In order to qualify the test, you need to know your raw conversions for the month. I don’t like running a test longer than a month, but it’s safe to stretch to 6 weeks if you absolutely need to.
Why? Two reasons:
- The Null – Running a test into perpetuity ignores one of the three possible outcomes for a test: the null. You are assuming your test will only have a lift or loss if you just run it until you see a change
- Time – More practically, if you run a test for longer than 6 weeks your data could be tainted by time. As the year goes by there are seasonal changes, influxes in traffic based on campaigns starting and stopping, and customer preference changes. These changes can impact your long-term tests and taint any learnings you gather.
Okay back to the minimum raw conversions. If you know your monthly conversions, break them down by day and then follow this chart:
Let’s break down this chart.
Let’s look at running a test with 2 variants over a 7-day period. The 29 means that both variations need to get a minimum of 29 converting actions per day.
If you don’t get 29 converting actions per each variation per day, move to the next week. Continue this process until you get to a point where you are getting that many converting actions. If you can’t do this within 42 days this page should not be tested.
Since we are talking about low traffic pages, I wouldn’t suggest having more than 2 total variations. I shared the conversion minimum for tests with multiple variants to showcase how the number of variants will impact test time.
Say the page you want to test doesn’t have 29 sales per variations per day, but has 29 add-to-cart clicks per variation per day. From a sales perspective you wouldn’t be able to run this test within a 7-day period, but if your test metric is the the add-to-cart micro conversion you could run this test within a 7-day period.
If you’ve determined your page doesn’t qualify for a traditional split test, it’s time to employ one of these 8 methods.
Let’s begin with…
Method 1 – Use Micro-conversions as Test Success Indicators
Whenever I run a test I always measure multiple success metrics.
For our acquisition campaigns I look specifically at the following metrics:
All of these are mid funnel to bottom funnel metrics. Leads generated are something we can easily quantify and attribute to our bottom line, much like Tripwire sales, and Digital Marketer Lab sales.
That said, I get a heck of a lot more leads generated than I do sales of both lab and the campaign’s Tripwire. So when I am running a test on a low traffic campaign (yes, we have those) I have to know when to call my test.
If I based this off of DM Lab sales, this test wouldn’t wrap up for months, but if I move up to leads I could close this beast in 2 weeks.
See for yourself:
This is the report for the purchased Tripwire. In this campaign I didn’t even bother measuring DM Lab sales because I KNEW there was no way they would be immediately impacted by this campaign.
In this example there is no statistical difference between the two variants at the Tripwire purchase level. I can’t get any conclusive data because I just don’t have the raw conversions.
Notice how the test data normalized and the two variants on the chart are now parallel. Let’s take a look at my add to cart rate now:
The chart looks the same, especially during the normalization period. However, we see that there is not enough data to call this test.
So let’s look at the leads:
In this case I have a lot of leads generated, ~1000 for each variation. The normalization of the numbers looks a lot like the add to cart rate and the purchase rate.
Based on the lead data, I can come to the conclusion that there is no statistical difference between the control and version A. When there is a 20% confidence rate it is pretty clear that neither variation is performing better than the other and the recognized change is likely due to other variables.
Instead of running this test for several weeks, which is something I’d have to do if I only measured the ‘Purchased EP’ metric. I can safely call this test for what it is: a null.
Protip: when you run into a null test you can implement either variation. It comes down to personal preference here.
Micro-conversions as indicators help you call tests in a shorter period of time. If you didn’t have the number of raw leads, then you could have looked at landing page call-to-action (CTA) clicks instead.
Be warned, in most cases you can’t attribute micro-conversion data to the bottom line. This is okay! Just be realistic with your reporting so your boss/client doesn’t expect a massive lift that was derived from landing page clicks!
Method 2 – Sacrifice Accuracy for Speed
We just covered how micro-conversions can’t accurately be attributed to the metrics your boss/client loves, e.g., sales metrics.
This is one method of sacrificing speed for accuracy and here’s another:
Lower your confidence rate.
I know I might be getting some judging eyes here, but understand that I don’t recommend this unless it’s a necessity.
The industry standard confidence rate is 95%.
TRANSLATION: There is still a 5% chance that the lift you see is actually a false positive. So, at the industry standard you still stand a 1/20 chance that your test is providing you no real lift.
This comes down to an organizational decision, but where are you willing to draw the line? Are you comfortable with a 90% confidence rate (a 1/10 chance for a false positive)?
If so, run tests to that rate – it will shave off a significant amount of time.
Here’s an example of the sample size necessary for a test that saw a 20% lift at a 95% confidence rate
You would need to have a sample size of 7,682 to run this test.
Now let’s cut this down to a 90% confidence rate:
The sample size needed for a 90% confidence rate is 6,052. That is a 21% decrease when compared to the 95% threshold.
When you lower your confidence rate you do leave yourself open to a higher risk of error, but you do make test completion more attainable.
Remember, this all comes down to your company’s internal preferences. If you aren’t comfortable going lower than 95%, then don’t!
But before we move on…
Most optimizers only talk about the confidence rate, but that isn’t the only factor to keep in mind. You need to look at the conversion range. The conversion range comes from the statistical power of your test.
The confidence rate only tells you the chance of a change being observed, it tells you nothing about the accuracy of the change. The conversion range will tell you more about what to actually expect!
In this example Visual Website Optimizer (VWO) gives you the observed conversion rate and then outlines the expected conversion range. Ideally, when running a test you don’t want your variation ranges to overlap.
In the case of our control here, we see that our expected conversion rate is anywhere from 35.83% to 42.41%. This range is fairly tight, but we still have to run this test for another week to tighten the range further. More conversions mean a tighter range.
Here’s an example of a very loose range:
Obviously this test would not be called right because there is only one conversion total. However, look at the conversion range. From this data we could potentially see a 2.24% loss or a 22.24% gain. Clearly we don’t want to gamble on that conversion range.
Just know for your reporting’s sake that the conversion range is a crucial component when analyzing the efficacy of your test.
Method 3 – Test on Low-Performing, High-Impact Pages
You never want to test a page simply because it’s your worst performing page.
Instead, test a poorly performing page that is important to your overall business goals and needs to be fixed.
(RELATED: Get 6 more split testing best practices here!)
In general your bounce rate is a good indicator of a page that is broken. The problem with bounce rate reporting is that it covers all of the outliers too. For example: a page with a single visit that has a 100% bounce rate. It takes a lot of data skimming to find the meaningful information.
Or does it? Here’s what you can do in Google Analytics to find high bounce pages that actually matter to your users…
First login to Google Analytics.
Go to Behavior > Site Content > All Pages.
Click Bounce Rate.
The data here isn’t very useful. It’s showing pages with a single view and a bounce. To make this useful you need to select a different sort type:
Now you have a list of pages that get a lot of traffic, but also have a high bounce rate. These are your major opportunity pages to start testing.
Method 4 – Go for Big Changes
You’ve seen the case studies that show a minor change leading to a huge lift in conversions.
In this example the version on the right increased clicks on the download link by 37.3% (Source: VWO).
I hate to break it to you — but this is a rarity.
In most cases these ‘small changes’ either aren’t that small, e.g., they are on key elements such as headlines or CTAs, or these tests weren’t run properly.
In most cases bigger changes heed bigger results (both positive and negative). If you really want to make a difference, you need to make major changes that people will notice and care about.
Why go for the big lift? Well a bigger lift means a smaller sample size! Check it out.
Here’s the samples size required for a page with a 10% lift at a 95% confidence rate:
You need a sample size of 29,502 people to complete this test. That’s probably more than you were expecting. So what if you have a page where you expect a bigger lift.
Let’s say 30%.
In this case we only need a sample size of 3,548 a decrease in sample size of 87.97%.
This is why you go after big changes! The bigger the lift between variants, it will decrease the necessary sample size and speed up the test timeline.
Method 5 – Run a Minimum Lift Calculation
If you’ve run into a questionable page, I recommend running this quick calculation to see if your test is viable.
All you need for the information is your unique traffic, current conversion rate, and preferred confidence rate.
For this example:
- Unique Traffic: 10,000
- Conversion Rate: 20%
- Confidence Rate: 95%
The second variation would have to have a 22% conversion rate, which is a lift of 10%. If I don’t think I can get a 10% lift on this page, then I won’t run this test.
This is a great method for qualifying your tests. Do it.
Method 6 – Don’t Run the Test. Do Something Else that Matters.
When you start tinkering with your test input and output you will reduce accuracy.
I’ve shown several cases where you must disqualify the test based on test timeline (more than 6 weeks forget it), sample size, and expected lift.
Being skilled at disqualifying tests is a valuable asset.
All tests have an associated real cost and an opportunity cost. Don’t spend time and resources on something that will give you bad data or shouldn’t be tested and don’t waste time and effort creating a test when you could have created a better one.
I’ve said it several times, just because you can’t test doesn’t mean you can’t optimize. Testing is just a tool in your tool belt!
These last two methods allow you to optimize when you can’t test…
Method 7 – Personalization
Emerging personalization tech companies have begun to provide powerful, affordable and easy to use tools.
Real time personalization is a powerful tool that provides a different experience for different visitors. This turns your website into a dynamic sales machine versus a static billboard.
At Digital Marketer we use on-site retargeting to do a bootstrapped version of personalization. Just using behavior based segments and exit pop triggers we were able to get 2,689 more leads in just two weeks.
Personalization is such a huge topic I’d need to dedicate an entire post to it. To wet your appetite, I’d recommend looking at some of the tools out there:
For a complete personalization package it can get pretty pricey, though the price has substantially come down from when Adobe Target was the only player. In the next 12 months I expect competition will drive the price down to make it more affordable to the people who can really benefit from personalization.
Method 8 – Qualitative Insights
If you have low traffic you are lacking the quantitative insights (the numbers) that make testing powerful.
So instead of crunching numbers, let’s look at how our customers actually use our site!
I love quantitative data and it is still under utilized.
This survey was given to online testing professionals and not only do less than half of them use some sort of qualitative data, the usage has actually dropped since 2014.
This is a major opportunity area and will help you increase conversions on pages you may not be able to test.
There are several different types of qualitative data to use, but the king of qualitative data is the session recording.
Session recordings are just that – a recording of a visitor’s session on your site. This data takes a while to dig through but gives some very incredible results.
We use a tool called HotJar to record sessions and they look like this…
Other types of qualitative data includes:
- User surveys
- Usability Studies
- Top Customer Service Questions
- Top Sales Questions
You can get a lot of optimization tips from data sources you already have. Use this qualitative data to create better user experiences to increase conversions when you can’t test.
You’re ready! Next time you need to run a test on a low traffic site, use these strategies to gather some good data!
Have a question?
Ask the DM team and 9,036 other members in the DM Engage Facebook Group!
Not a DM Lab Member? Learn more here.