Interested in joining Quantcast?

The Scale & Transport team allows Quantcast engineering to process peta-scale data cost effectively and ergonomically.

Our team owns Quantcast’s peta-scale data processing (20-40 PB processed daily), its data transport system (~100TB transferred daily), and its real time event collection system (250k requests/second).

We also own the QFS open source project.

See What Our Engineers Work On

Nabil Zaman

Nabil developed a novel set data structure that efficiently tracks large quantities of (mostly) sequential values. Instead of storing the set entries individually, they are grouped into closed intervals with new insertions either falling into an existing interval or creating a new one.

By tagging messages passing through our large scale data streams with sequential IDs, we’re able to leverage that data structure to monitor or improve the integrity of our data. We identify data loss in real-time by counting the gaps in the sequence, and eliminate duplication errors altogether.

Mehmet Can Kurt

Mehmet introduced improvements to our cluster resource allocation algorithm so that priorities of jobs that have waited too long to get resources are adjusted on the fly and they are moved to the front of the queue.

This helped us address the starvation issues in Quantflow MapReduce cluster, and make sure that SLAs in our stacks are not missed even when the cluster is under heavy demand.

Here are some crazy numbers

20-40PBs

processed per day by our custom mapreduce system.

15PBs

stored in our distributed file system (QFS).

250K

requests/second handled by our real time event collection system.

8MM

requests/second handled by our distributed key-value store.

100TBs

transferred by our data transport system daily