It’s great to sell more stuff. It’s what we at Quantcast focus on helping advertisers do, but how does an advertiser differentiate between correlation and causation:
Did my digital marketing efforts actually help me sell more or would I have sold it anyways?
In the digital advertising space, it often seems that the more data advertisers have access to, the more questions they have. A performance metric like click-through rate is not a good proxy for impact on sales, and cost per acquisition and return on ad spend don’t always tell the entire story. Absent a golden metric, digital marketers turn to Incrementality Testing to quantify the value of their media dollars in driving sales.
In this post, we will discuss:
- What Incrementality Testing is
- Benchmarking for success
- Gaming the System
- How to conduct an Incrementality Test
WHAT IS INCREMENTALITY TESTING?
Incrementality Testing, also referred to as “Uplift Modeling”, “Placebo Effect” or “Incremental Sales Lift”, is a test that measures the impact of a single variable on an individual user’s behavior. For digital display marketing, it is most commonly used to measure the impact of a branded digital ad (Exposed group) against a Public Service Announcement (PSA) ad (Control group). The lift is measured as the percent difference between the two.
This incremental lift indicates the impact of a particular (digital) advertising tactic on sales – the holy grail of advertising. It is possible to calculate but incremental testing is expensive (budget is spent on PSA placebo ads) and subject to many pitfalls unless executed carefully.
AB versus Incrementality Testing
AB testing is an umbrella term that encompasses Incrementality Testing. In the world of online advertising, AB Tests can be used to test creative treatments, email subject lines, Call to Actions (CTAs), or website pages. In comparison, Incrementality Testing focuses on the lift of key purchase indicators, as measured by conversion rate (CVR).
Incrementality Testing can have several objectives for advertisers:
- Validate digital media’s impact versus organic conversions
- Validate view conversions
- Validate a single vendor’s targeting tactic
- Validate & compare incremental sales across programmatic 5vendors (a.k.a. head-to-head test)
Perhaps one of the largest misconceptions about incrementality testing is that it can be used to assign view-through credit. Many advertisers assume that a 30% incrementality lift means 30% credit for view conversions is appropriate while assigning 100% credit for conversions that occurred after a click. This extrapolation suffers from a fatal logical flaw – it assumes that a click conversion is 100% due to the advertising. Post-click conversions from either Display or Search are not 100% incremental – many of those post-click conversions would also have happened regardless of digital ads seen.
No matter what your objective for the test, the same calculation is typically used:
*CVR = Conversion Rate
**Results are given as percentage lift at a 80-95% specific statistical significance confidence level
BENCHMARKING FOR SUCCESS
Everyone loves benchmarks. And why wouldn’t they? Benchmarks are an easy way to compare your results against an average, and gauge what you consider good or bad. However, in the case of Incrementality Testing, it isn’t so easy.
According to six tests conducted by Quantcast, a vendor’s Incremental Prospecting Conversion Rate Lift varied from 12% to 177%.
How did this happen? The truth is that even when holding all variables constant, there are still multiple factors and potential setup pitfalls that can impact lift and determine what lift is valuable to a marketer.
Factor 1: Marketer’s brand equity
In the world of marketers, it pays to be popular. Literally. Brands that are more easily recognized will naturally garner more attention. But in the case of Incrementality Testing, this means that the conversion lift for a well-known brand will be less than for a brand who has never done digital advertising and launches a high impression volume campaign for the first time. That doesn’t mean the established brand’s 5% lift is less meaningful than the new marketer’s 20% lift. Success is subjective, so marketers need to analyze their results with future potential in mind, considering economies of scale in the lifetime value of new customers, and overall revenue driven by each.
The most savvy digital advertisers we work with are often delighted with incrementality that is as low as 1%. The lift, if proven to be statistically significant, may not be dramatic but in that small percentage lift in sales often lies a dramatic impact on profits. On the other hand, we sometimes hear newer brands complain about double-digit incremental lift. Understanding, the lifetime value of a new customer once operating costs are factored in and comparing it to the marketing investment is an equation that should be understood. No marketer should undertake an incremental test without knowing what lift would be satisfactory to them and having some sense of what incremental lift is likely in their vertical and for their degree of organic brand awareness. Without expectations being understood ahead of time, confusion reigns once the results are calculated.
Factor 2: Control group creative
Not all public service announcements (PSAs) are created equal. A PSA promoting Pet Rescue with an adorable dog on the banner may attract more organic interest than one promoting Stroke Awareness with no images. Since human nature is unpredictable, even when trying to control for every variable, there will always be some level of unpredictability.
Factor 3: Outside media
If you are running media outside your test with other programmatic vendors, digital channel partners (social, search), or traditional offline partners (TV, billboards, radio, print), how are you ensuring the control group is not exposed to your brand through these other marketing efforts? The point of a control group is to be unbiased. If your control group is exposed to other media in uneven or unknown ways, you risk data contamination. The expectation that advertisers will turn off all other media for a test is unrealistic, but it should play into interpreting results.
Factor 4: Seasonality
If Black Friday and Cyber Monday tell us anything, people shop a lot more during the Holidays than other times of the year. If an advertiser measures their campaign’s incremental lift in November and then compares it to results in February, then they should not expect their results to look similar. For this reason, it’s important to avoid testing during peak seasons (which vary by brand or advertiser) and aim to keep the testing timeframe as equal as possible (i.e. if you start your test on a Monday, it should end on a Sunday to avoid any changes in consumer behavior during the week.)
GAMING THE SYSTEM
Over the course of 10 years, Quantcast has consulted on and implemented 300+ incrementality tests; and with that, we have also seen that there are many, many ways to muddy your data intentionally. Vendors “game the system” to get to their desired lift, skewing the results.
The biggest way vendors do this is is by using an artificially low baseline. If vendor A drove 40% lift and vendor B drove 10% lift, which one has driven a better marketing result with greater incremental impact? Most advertisers assume that the answer is obvious – vendor A is clearly superior. But the answer is not at all clear unless you know the quality of the baseline control group. What if the natural conversion rate of the control group used by vendor B is 10x higher than for vendor A?
Take a look at the lift formula we provided earlier. Incremental lift percentage is higher if the PSA Ad Conversion Rate is lower. Lift is calculated over a baseline. If two vendors have two different baselines, their lift numbers can’t be compared.
To make this more intuitive, let’s compare the height of two gentlemen, Steph and Kevin.
Steph is 100% taller than everyone else in the room he’s currently in
Kevin is only 5% taller than everyone else in the room that he’s in
Who’s taller? If you’re following along, you know that you don’t yet have the information to answer that question.
What if Steph (Curry) is playing with his daughter at home? She’s 3 feet tall and he’s 6 feet.
What if Kevin (Durant) is practicing at the Golden State Warriors practice facility? The average height of the Warriors is 6 feet, 7 inches. Kevin is 6 feet, 11 inches.
Kevin is taller than Steph (by almost a foot), but that is unclear until the baseline is known..
In a strict incremental lift test, your control audience should be identical to your exposed audience, but to inflate lift, some vendors may target an inferior control group. They do this by lowering inventory bid price, leveraging less ideal media channels, or excluding retargeting on your control group.
Two ways to solve:
- Conduct the test in house with a single, common baseline control group across vendors
- Quantcast can help you with dual control
At a minimum: If you are not implementing your tests in house, make sure to ask your programmatic partner tough questions about how the control group is targeted, potentially poking holes in their setup to skew results in their favor.
HOW TO CONDUCT AN INCREMENTALITY TEST
Before executing an Incrementality Test, it is crucial that you have a clear objective that you can action on based on your results. This includes formulating a sound hypothesis, defining your desired outcome, aligning your setup with the specific metrics you want to test, and building an action plan once your test is complete.
Advertisers often request to test a marketing tactic in isolation, but their hypothesis is to understand the impact of that marketing tactic on their entire path to conversion or to compare them to other vendors. In a scenario like this, the advertiser is actually trying to solve for attribution, and a lift test will not return actionable data against that hypothesis. The only way to truly measure the programmatic impact versus other marketing tactics and channels is through an attribution vendor. Otherwise, how will you know that your Control and/or Exposed groups were not influenced by other marketing tactics outside of the test?
Setting clear objectives allow your partner to undercover any issues up front to ensure an Incrementality Test is the right approach, and if it is, that it is setup correctly.
When conducting any type of hypothetical test, if your sample size is too small, there’s a good chance you won’t reach statistical significance as measured through ‘confidence levels’. Confidence levels are the minimum threshold at which results cannot be due to random chance, and that if repeated, would return similar results. For Incrementality Testing, 80% is low but acceptable, 95%+ is the ideal, and 100% isn’t possible unless you surveyed an entire population with zero error or bias. Only analyze your test results once your control and exposed data sets reach confidence levels above 80%. If you aren’t sure ‘confidence’ levels have been met, there are many free tools online to calculate statistical significance and sample size. Also, allow the full lookback window to elapse once your campaign has completed before analyzing your data to allow for latent conversions.
Critical to any test which employs a control/exposed methodology is a proper ‘cookie hash’. This is the segmentation of a cookie pool into two distinct, mutually exclusive groups. The control group is only shown the pubic service announcement ad and the exposed group is only shown the branded ad. Any cookie overlap leads to data contamination and nullifies the results of a test.
Incrementality Testing through your Ad Server
Conducting an Incrementality Test in your own ad server allows you to tightly control all parameters, ensuring you maintain variable consistency. DoubleClick Campaign Manager (DCM) offers a feature called Audience Segmentation, allowing you to easily setup a cookie hash not only for each variable tested, but also across multiple vendors. We have outlined the process for DCM here, but feel free to reach out to your Quantcast Account Manager for consultation and support.
When considering your lift result, factor in the following:
Compare Like Vendors. When comparing multi-vendor results, make sure the partners have similar targeting tactics to avoid overinvesting in a partner that “looks better” simply based on performance and not long-term growth potential.
Ratio Metric. Calculate lift using a ratio metric such as a Conversion Rate, click-through rate, or percentage of new users versus using an absolute metric. This equalizes for differences in group size.
View Conversions. Only use metrics from view conversions, since PSA ads do not click convert.
Consult with your vendor and define the optimal outcome before the test begins, as you may uncover either unrealistic expectations or strange setup nuances that could completely change the results of the test.
The truth is, digital media does help sell more stuff and Incrementality Testing can be a very powerful tool in measuring this sales impact. If you’re set on finding the answer, it’s important to approach these methodically, asking yourself the following questions:
- What is my test hypothesis? How will I action on these results if my hypothesis is proven true? If proven false?
- Do I have realistic expectations on expected brand lift based on my brand equity?
- Are some of my vendors “gaming the system” by using an artificially low baseline?
- Have I considered all factors which may skew my results, including the control group creative, comparing like vendors, outside media, seasonality, and confidence levels?
- Can I rely on my data?
- Is my test hypothesis better controlled by implementing through my own ad server?
Ultimately, data for the sake of data is not a good use of your time or marketing budget. Let’s make sure those dollars are driving your bottom line.