When Quantcast customers are designing their ad campaigns, they can choose from trillions of different combinations of audience traits, inventory attributes, and campaign goals. Ara, the AI and machine learning engine that powers the Quantcast Platform, can reach specific audiences across hundreds of metro areas, thousands of domains, and millions of user interests, just to name a few. With so many possible dimensions, it can be challenging for our customers to immediately understand the scope of the audience they are trying to reach, and how their campaign’s performance may change with their selected campaign configurations. To make this easier, pre-campaign forecasting provides our customers with an invaluable estimate of key performance indicators, like how many impressions their ad sets will likely deliver and what their audience reach will be with a given configuration. At Quantcast, we leverage our unique technical expertise and modeling techniques to construct a robust pre-campaign forecasting pipeline that can return a forecast based on trillions of data points in less than a few seconds.
What exactly are we trying to forecast?
While building an ad set, our customers are primarily interested in:
- potential reach (how many browsers match the ad set’s desired audience?)
- impressions (how many impressions would be shown given the selected budget and other constraints?)
- reach (how many unique browsers would the ad set reach given the selected budget and other constraints?)
When forecasting how an ad set with a customer’s given campaign constraints will deliver, we want to align as closely as possible with how Ara would optimize that ad set’s performance in real time. These campaign constraints can be separated into binary and probabilistic constraints, with respect to how Ara meets them:
- Binary constraints: At bid time, based on the attributes of the bid opportunity provided by the ad exchange, Ara can determine whether a bid opportunity is either from a relevant zip code / site domain / device (for example) or not. Ara then filters all incoming bid opportunities to only those that satisfy every binary constraint.
- Probabilistic constraints: For some constraints, such as demographic group and viewability rate, we cannot determine with 100% certainty if the bid opportunity satisfies the ad set constraint based on its exchange-provided attributes. Accordingly, Ara instead predicts the probability that the given constraint is met, such as 80% likely that the bid opportunity will be viewable and 90% likely that the bid opportunity corresponds to a web user between 25-30 years old. Ara then leverages those likelihoods along with various control signals and multi-goal optimization in order to maximize the primary key performance indicator (KPI) of an ad campaign (e.g., conversions, reach) while meeting those probabilistic constraints (e.g., 75% of all impressions delivered were viewable).
Forecasting potential reach
When forecasting potential reach, we only consider binary constraints, as Ara predicts probabilities for probabilistic constraints at bid time. These binary constraints can be represented using Boolean logic conditions, such as (“from Utah” OR “from Ohio”) AND (“interested in tacos” OR “interested in burritos”) and (“on cnn.com” OR “on bbc.com”). Previous approaches have attempted to forecast potential reach under binary constraints by imposing conditional independence assumptions between categories of constraints or by approximating the distribution of potential reach with tree-based models. Fortunately, at Quantcast we can take a direct approach with very few assumptions. Using our Kamke database, we can compute intersections and unions across billions of bid opportunities in the past week to project the number of bid opportunities / users that satisfy (“from Utah” OR “from Ohio”) AND (“interested in tacos” OR “interested in burritos”) and (“on cnn.com” OR “on bbc.com”). Kamke can return this estimate within a second.
Forecasting impressions and reach
The dynamics and models that comprise Ara’s multi-goal optimization controller for bidding are in continuous development. To avoid training against a moving target, we decouple and simplify our pre-campaign forecasting pipeline by treating Ara’s controller as a black box that will deliver a number of impressions given a daily budget and set of probabilistic constraints. Concretely, we fit a model to the following relationship: F(budget, constraint_0, constraint_1, …, constrain_n) → impressions.
The campaign goal also has a significant impact on this relationship (e.g., video-view optimized campaigns usually produce higher costs per impression than conversion-optimized campaigns), so we partition our training data by campaign goal and train a separate model for each goal.
To account for the probabilistic constraints, we project the constraints into a richer feature space by estimating the resulting reduction in impressions over a continuous range for that constraint (e.g., a viewability rate of 0%-100%). For demographics, each demographic category (age, gender, education level, etc.) is projected into its own space and its possible values are translated into a continuous range, based on the empirical distribution of that category’s groups (per country). Training sets for these models are taken from campaigns optimized to maximize viewability rates and demographic compositions. We then combine these to compute our final impression forecast as: F(budget, constraint_0, constraint_1, …, constrain_n, p1(viewability), p2(age)….) → impressions.
Finally, to forecast reach, we predict the ad set’s frequency (impressions per unique browser) and then scale our impressions forecast by that frequency. For ad sets without frequency goals, we derive the frequency from the ad set’s explicitly set frequency cap or a global frequency computed over all campaigns without frequency caps. For ad sets with frequency goals, using historical data compiled from campaigns optimized towards frequency goals, we similarly train a model that estimates the change in reach induced by the frequency goal: G(impressions, frequency_goal) → frequency; where impressions / frequency → reach.