Advertising on the internet is big and fast. Big due to the volume of requests thrown around; fast because of how quickly responses are generated and propagated. To handle the scale, you need to plan and evolve your architecture and application accordingly or face being priced out of the market. In this blog post, I’ll discuss one small part of how we at Quantcast do this and deliver superior campaign outcomes for our customers. 

Considerations when building, evolving, and optimizing a large distributed system

At the architectural level, you often find yourself thinking about how to shard and distribute things (for us, these things are bids, bid strategies, and users). Ideally, you end up with something that scales better than linearly to the number of bids processed and bid strategies considered. Most likely, the final scaling equation (think big O notation for data center planning) includes both variables and evolves over time as major architectural changes are made.

At the application (bidder) level things are the same… but different. You tend to be thinking more along the lines of: how many bids per second per central processing unit (CPU) core can we achieve? How can we identify and remove any bottlenecks preventing this number from scaling? Do we have unnecessary locks, context switches, algorithmic complexity, or greedy background tasks we can optimize? You then develop, instrument, and profile your way through solving a bunch of interesting problems and hopefully set up the means to identify future degradations as the application evolves.

Today, I want to focus on a small part of application optimization, specifically a simple(ish) trick that we–and many other systems–use to allow a very large number of decisions to be made insanely fast. Bitmaps and bitwise operations can help with cyclomatic complexity issues.

What is a bitmap?

Bitmaps are a representation of some domain-specific information in a set of one or more bits. It’s totally up to you, the designer, what each bit represents. The point is they are small and stored as a contiguous string of 1’s and 0’s, where each bit’s state represents something dependent on its position in the map.

We can make up an example to demonstrate and build on: in this case, I’m going to create a bitmap of my 64 most esteemed friends’ preferences on food. To start, we’ll ask one simple Boolean question: Do they eat meat? 1 for yes, 0 for no.

Before we create our bitmap, we need something I’ll call a ‘universe.’ This is the actual rich domain information. The most important thing to consider here is the ordering, as it is this ordering that will decide the meaning of our bitmaps. If the ordering is ever changed, all of our bitmaps become invalid and need recalculation.

So, say our ordered universe contains Lukasz, Gurel, Claire, Ben …

Our bitmap containing meat preference (1=yes, 0=no) could look like 1011… 

Gurel is the only non-meat eater; we can tell due to his position in our universe.

That is it. Fairly simple, really. We’ve been able to store some simple Boolean information about a larger set of data in a very compact space.

However, it isn’t space compaction that makes this interesting. Memory nowadays is very cheap and most of our challenges in the Quantcast Platform are CPU-bound. What we’re interested in is how to ask a very large number of Boolean questions over a very large set of elements very quickly, and this is where bitwise operations come in handy.

Why do we care about bitwise operations?

Bitwise operations allow a CPU to perform AND, OR, XOR, NOT, TEST operations in parallel on a number of bits called a word (typically 64 nowadays); with a single CPU operation, we can compare up to 64 bits from two different words.

As a result, I can calculate who eats meat with a single CPU operation (if our data is already in a register) by performing a set intersection using AND, comparing my complete universe of my friends to my ‘do you eat meat’ bitmap. 1111 & 1011 = 1011. This is many orders of magnitude faster to answer the question than looping through a list of each of my friends, pulling out their ‘do you eat meat’ Boolean, comparing it to true, and so on.

I imagine you’re starting to see where this can go. Say we add some more bitmaps. We add:  

  • Who likes Mexican food?
    • 1111  (everyone)
  • Who is usually up for drinks after work?
    • 1101  (not Ben)
  • We add a set of bitmaps to determine their engineering level:
    • L4 – 1000 … (Lukasz)
    • L5 – 0011 … (Ben, Claire)
    • L6 – 0100 … (Gurel)

Now, in a handful of CPU instructions, we can very quickly ask a series of Boolean questions to determine something we might find useful. For example, let’s ask: who eats meat AND likes Mexican food AND likely wants a drink after work AND is NOT too senior (less than L6, say)?

A quick reference back to our universe tells me I should ask Lukasz and Claire if they want to go for Mexican food and a margarita after work.

How do we use this at Quantcast?

We do a lot of filtering. We’re constantly asking questions like: Which bid strategies are valid for this domain? Which bid strategies want to block URLs containing this keyword? Which creatives support format X? Instead of exhaustively looping through elements, we break the complex web down into a series of filters, which contain precomputed bitmaps and make access to the correct set of bitmaps as efficient as possible (i.e., avoid the loops).

To do this, we first need to think of a way to build what we’re trying to calculate into a series of Boolean questions that can be formed into bitmaps. Then we have to find a data structure which will allow us access to those bitmaps in a timely fashion. Typically, that would be a hashmap, trie, tree, multimap, or any data structure allowing us to scale sublinerly to the number of bid strategies, domains, creatives, etc.. we process.

An example of a filter returning only display-type bid strategies:

As you can see from this code snippet, we make extensive use of a proprietary data structure called ConstrainedSet. It is basically a BitSet wrapper built to interact with another class called OrderedFiniteUniverse, which contains our actual data. We do a few other tricks: when the ConstrainedSet is very sparse over a very large universe, we end up using an integer index array instead of an actual bitmap to reduce the overall number of operations. Plus, the wrapper also provides nice set operations like you might reasonably expect of union() intersection() difference(), encompassing multiple lower-level bitwise operations to achieve the standard set operations.

Have we considered open source alternatives?

Yes, we have! Roaring Bitmaps is the most obvious open source alternative and is used in a few other Quantcast projects (such as Kamke–check out Jackson’s blog post on this topic, if you haven’t already). We’ve even done some basic performance analysis comparing the two. For our specific use case, we found that ConstraintSet outperformed the open source solution at the 95th percentile. So, for the moment at least, we have stuck with it.

Learn More

If you missed our engineering blog post on how our engineers and modeling scientists leverage our unique technical systems and machine learning techniques to construct a robust pre-campaign forecasting pipeline for customers, click here to read “Ara, Tell Me What My Campaign Forecast Looks Like For Today.” And if you’re interested in learning even more about what our engineers do, we’re hiring! Join us.