Quantcast was processing big data before big data was cool. Behind our audience measurement and ad delivery systems lies a mountain of data, and over the years we’ve developed a vast trove of technology and expertise for handling it.
So we’re excited to announce that we’re sharing an important piece of that technology with the community. As of today, the Quantcast File System (QFS) is available to other Hadoop users (and the world in general) as an open source project.
Does the world need another file system? Yes, it does. If you’re using Hadoop’s HDFS, you’re probably storing three copies of all your data for fault tolerance. And you’re buying extra disk drives to store those copies, and servers to hold the drives, and racks for the servers, and power for the racks, and cooling to counterbalance the power. If you process enough data, those costs can total five, six, or even seven figures per month.
QFS can help. Rather than triple replication, it uses Reed-Solomon encoding, the same error-correction technique used since the 1980s in many technologies including CDs, DVDs, DSL, and more recently Mars rovers. Reed-Solomon provides even better fault tolerance, at a cost of only 50% additional storage space. In other words, where HDFS needs 3x the disk space, QFS needs only 1.5x. It halves your costs for disks and everything else needed to keep them spinning.
Did we mention QFS makes Hadoop jobs run faster, too? It does. Writing goes faster because jobs have to write only half as much physical data. Reading goes faster because QFS does all reads in parallel across multiple drives, making better use of drives that would otherwise be idle.
Finally, it really works. It has been live at Quantcast for four years while we’ve been steadily improving it. A year ago we went all in and switched all our map-reduce processing to use it. Four exabytes of I/O later, we’re confident it’s solid and ready for other organizations’ mission-critical workloads.
Posted by Jim Kelly, Vice President of Research & Development