Jim Morrison is a cyclist and software engineer. He has a B.Math in Computer Science from the University of Waterloo and before Quantcast was the tech lead of the GFE team at Google. As senior manager of Edge Services at Quantcast, Jim leads the teams that build and maintain the low latency, high throughput serving systems for Measure and Advertise.
At Quantcast, speed is everything. The vast majority of requests for both Measure and Advertise are served in under 16 milliseconds and we consider any request that is served in over 64 milliseconds to be a failure that needs to be addressed. Requests for Measure are pixel requests that measure web traffic for our publisher partners. Requests for Advertise are RTB auctions where we find the best ad for the bid request. Each of these requests is handled by a series of services, working together, to produce a final decision that allows us to take action for a given user. Part of this workflow is looking up anonymized user data from the many data sets created for both our Measure and Advertise products. These data sets are easy to create using Quantcast’s powerful offline computation cluster, but using that data in millions of random lookups per second at very low latency is a hard problem.
We needed a service that could serve the data from the large offline datasets for our online products. This service is Keebler. Keebler is Quantcast’s distributed cookie data store and is responsible for serving data for all of Quantcast’s anonymous user IDs. Keebler scales to millions of reads per second and hundreds of thousands of writes per second. Datasets served by Keebler contain billions of keys and reach many terabytes in size. Keebler gets offline data from our computation system through a scalable, distributed, geographical topology aware, peer-to-peer data transfer system (we’ll cover that in a future blog post).
Keebler’s architecture is purposefully simple. A Keebler cluster runs on machines with SSDs. The datasets served by the Keebler cluster are split across the machines. Each machine serves a subset of the datasets (a set of shards) and, to ensure data availability, each shard has multiple replicas spread across the cluster. When Keebler starts, it registers itself in Apache Zookeeper with the shards that it owns. Other services discover instances of Keebler through Apache Zookeeper and cache the mapping of shards to Keebler instances. Clients use this mapping to select the correct Keebler to query for a given user. With multiple replicas and hundreds of shards Keebler is able to scale to hundreds of machines and tolerate machine or rack failures.
Each user falls into exactly one shard and all data we have for that user is available in that shard, thus a request for data on a user goes to a single Keebler instance. When it receives a request, Keebler looks up the data being requested in its multiple offline datasets, merges all the results together, and returns the data back to the client service that made the original request. The merged data is then cached in an in-memory LRU cache so that it can avoid the cost of future IO operations (SSD lookups) and CPU consumption (merging the results). Keebler’s LRU hit rate is an impressive 95%.
With each Keebler registering its shard set in Apache Zookeeper and each request being served by a single instance, Keeblers are completely isolated and independent from each other. No Keebler instance tracks or communicates with any other Keebler instance.
On Disk Format
The data files that Keebler reads are split into 3 different components. The first is a json file. The json file contains metadata about the other two files. The second file is an index file. The index file is an array of offsets into the data file. The last file is the actual data. The data is a simple serialized map where each entry contains the fixed size key, length of data, and data. The size of the key is described in the json metadata file. To find the data for a given key Keebler takes the following steps:
- Take the prefix bits of the key and use that as the index to the data file index array
- Read the data between the offset in the index array to the next offset from the index array.
- Do a linear scan over the values until the correct key is found. A key is not found if the scan hits the end of the buffer read from step 2.
Waiting for batch jobs is too slow
Data from batch jobs is great since it is complete, very durable and easy to replicate. However, it is slow. Data from batch jobs can take hours and waiting hours for data means we lose out on opportunities to show the most relevant ads. Thus we’ve added online storage to Keebler as well. The online storage for Keebler is a Redis cluster. Keebler writes recent data to Redis in real-time. The write to Redis is done as a read-modify-write. The contents from Redis are merged with the contents from a real-time update using the same merge function as used to merge datasets from disk. Next the lookup request is extended to read from Redis. The contents from Redis are merged with the contents from disk, again using the same merge function, and the merged contents are results for the lookup request.
Serving massive amounts of data with speed and simplicity is a challenge. At Quantcast one of the ways we’re able to maintain the speed of processing requests for Measure and Advertise at scale is with the help of Keebler. The resulting processes help make us one of the top data processors in the world with an unparalleled insight into online behavior.