in Company News

Quantcast File System on Amazon S3

By Michael Ovsiannikov, Principal Software Engineer, Kevin Stinson, Senior Software Engineer, and Mehmet Can Kurt, Software Engineer

Quantcast File System (QFS) is a high-performance, fault-tolerant, distributed file system developed to support MapReduce processing, or other applications reading and writing large files sequentially. QFS is written in C++ and is plugin compatible with Hadoop MapReduce, and offers several efficiency improvements relative to HDFS.

Quantcast is pleased to announce the general availability of Quantcast File System (QFS) version 1.2. This version of QFS comes with many bug fixes, improvements, and new features. It has been tested internally under production load for the last few months, and we are confident that this next release is ready for use in production systems. For a full list of changes, check out the release notes on GitHub.

Many companies today leverage cloud storage infrastructure such as Amazon S3 for their storage needs. Amazon S3 is a cost effective solution for storing large amounts of data. Quantcast has updated QFS to support Amazon S3 as a backend store for its chunk storage, combining dynamic storage scaling capabilities of S3 with reliability and endurance of QFS. This integration brings some major improvements to how Amazon S3 can be used by organizations looking to store large amounts of data. In this post, we are going to go through some of the major architectural changes that QFS on Amazon S3 brings.

Existing File Systems on Amazon S3
Amazon S3 is an object store. While you can create a directory structure within Amazon S3, there are various other features (e.g. command line) missing which make using Amazon S3 as a filesystem harder than necessary. There are some existing distributed filesystems that allow storing data in Amazon S3 while addressing some of the missing functionality. These efforts can be mainly grouped as block-based and object-based. Block-based filesystems store files in Amazon S3 as blocks, just like HDFS and QFS, whereas object-based filesystems store each file in a single object. Apache Hadoop ecosystem provides implementations in both flavours. For instance, Hadoop S3 is a block-based filesystem which requires dedicating an entire S3 bucket. Although the files do not have an upper size limit and can be renamed, they are not operable with other S3 tools and the filesystem is to be deprecated soon. In contrast, Hadoop S3n and S3a filesystems are object-based with each file being mapped to exactly one object in S3. Neither S3n nor S3a have atomic renaming support and they both have limitations for file size. Amazon EMRFS is another filesystem based on HDFS. Falling into object-based category, it uses a proprietary S3 client and only available in Amazon EMR clusters. Besides these distributed filesystems, s3fs-fuse employs FUSE and presents an Amazon S3 backed local filesystem.

Quantcast File System on Amazon S3
Capabilities: QFS implements a proper filesystem on top of Amazon S3 while introducing some new features, as well as bringing along all of its existing features. These include:

  • Hierarchical file system support, including unix style hierarchical file permission.
  • Atomic namespace operations like rename, create exclusive, permission and ownership modifications.
  • No file name limitations in the sense that there is no requirement to use a special filename prefix in order to maintain uniformity within an S3 bucket.
  • No practical file size limit.
  • Seamless integration with existing QFS, HDFS and Hadoop environments.
  • Optional use of industry – standard authentication mechanisms: Kerberos, TLS/SSL X509, and TLS / SSL communication encryption.

Architectural Changes: We next outline some of the major architectural changes we made on QFS for Amazon S3 support.

  • Chunkservers as S3 access proxies (AP): In QFS on S3 instances, each chunkserver functions as an access proxy (AP). An access proxy is basically used to retrieve the necessary AWS credentials from QFS metaserver and provide QFS clients access to the underlying S3 bucket. Thereafter, read/write calls issued by QFS clients on an object store file is served by the access proxy, which uses the host memory as its buffering area. The diagram below illustrates this key difference using a read operation as an example. In a regular QFS instance, in order to read a Reed-Solomon encoded file, a QFS client process reads chunks from six different chunkservers, each of which in turn reads the corresponding chunk data from its own local storage (to simplify the illustration, we only show the first three chunkservers in the diagram). On the other hand, if the file is an object store file, the QFS client by default uses the access proxy on the same host, which reads the corresponding object store block from Amazon S3 and forwards the buffered data to the client. This is depicted in the right part of the diagram.

QFS Blog S3_diagram

  • No Reed-Solomon encoding for object store files: As you may already know, in addition to replication, QFS uses error correction codes, in particular Reed-Solomon, to significantly reduce the disk space required compared to plain replication while, at the same time, providing better reliability. This has shown significant cost savings for organizations running their own infrastructure. Reed-Solomon encoding is disabled for QFS object store files, since Amazon S3 already provides very high levels of durability for stored data behind the scene.

Performance Comparison with EMRFS
Next, we compare the performance of QFS to EMRFS, an existing filesystem by Amazon that allows storing files in Amazon S3. EMRFS is an implementation of HDFS and currently is only available in Amazon EMR clusters.

Experiment Setup: To do the comparison, we first created an EMR cluster using EMR version 5.0.0. The cluster consists of one master instance (m3.medium) and multiple core instances (c4.4xlarge) allocated from AWS EC2 instance pool. For more information on the EC2 instances we used, please visit here.

EMRFS is automatically enabled in an EMR cluster. We used the out-of-the-box settings without any additional tuning and without enabling the consistent view feature. EMRFS doesn’t need a dedicated metaserver; read/write requests as well as other filesystem operations (ls, mkdir, rmdir, etc) are carried out by talking to the S3 end directly. On the other hand, for QFS metaserver, we used a dedicated EC2 instance (r3.8xlarge). For a fair comparison, we deployed all of the QFS chunkservers, aka access proxies, on the allocated EMR core instances.

Results for Writes/Reads: First, we compare the performance of two Hadoop jobs, which write and read back 100 GB data, using QFS and EMRFS as the underlying filesystem. In both, corresponding I/O operations are performed in the map phase and the reduce phase is skipped. Each job uses 4 EMR cores instances to run the mappers. We varied the number of mappers per instance between 4 and 16, which gave us different configurations with total number of mappers ranging from 16 to 64. The graph below shows the end-to-end execution times (including task scheduling times by Hadoop as well), so the lower the value the better the performance is. We repeated each configuration 20 times, and reported the average time.

QFSBlogImage2

The results show that QFS is at least as fast as EMRFS. For both filesystems, increasing number of mappers does not necessarily increase the performance. For QFS, this is expected. With large enough files, no encryption between client and S3 access proxy and sufficient write behind threshold (for related QFS client parameters please visit here), the number of mappers on a single instance would not have a significant impact on write performance, i.e. a small set of writers can provide the same amount of data to the local access proxy, which is responsible of buffering the data and writing it to S3 end by complying with S3 data transfer rate limits. Overall, QFS provides a 4% average improvement to the Hadoop job over EMRFS when writing 100GB data to S3. Next, we make the same comparison for reads.

QFSBlogImage3

For reads, the performance difference between QFS and EMRFS is more visible with QFS being 20% faster than EMRFS on average. In addition to performance difference, we also observed significant variation in EMRFS runs (with a value of 0.5 for standard deviation to mean ratio), while QFS performed more consistently throughout the different runs (standard deviation to mean ratio being only 0.06)

Other filesystem operations: Next, we compare the performance of QFS and EMRFS when we issue a variety of other filesystem operations that don’t involve actual I/O. For this test, we allocated 16 instances. However, instead of running a hadoop job, this time we ran 32 filesystem clients on each instance at the same time. Each client creates a balanced directory structure of 64 directories each with 64 subdirectories. Considering all clients running on all instances, the entire test creates ~2.1 million directories and issues again ~2.1 million mkdir, stat, readdir and rmdir operations against each filesystem.The figure below shows the number of completed operations per second for each operation against each filesystem. In the figure, we include the performance of QFS C++ and QFS Java clients both, in addition to EMRFS clients for which the only option is Java.

QFS Blog -mstress_test

The results show that using QFS Java clients one can perform 14x more mkdir, 4x more stat, 3x more readdir and 4x more rmdir operations per second, respectively, compared to EMRFS. The performance difference between two filesystems becomes even more substantial when employing QFS C++ clients for the test, since the entire Java stack calls and associated overheads are avoided.

How to setup QFS on Amazon S3
To get you started with the new S3 feature, we updated QFS github wiki with the related documentation. For a guide for how to enable and test the S3 feature on a QFS instance as well as for some useful tips, please visit here. Also, the entire list of S3 related configuration parameters can be found in the annotated chunkserver and metaserver configuration files.

Conclusion
QFS 1.2 comes with many bug fixes, improvements, and features. Quantcast is proud to release the next version of our massively scalable and massively performant filesystem to the open source community. Please check out the project on GitHub. Feel free to send an email to qfs-devel@googlegroups.com as well if you have any questions about the project. Quantcast employees regularly monitor this group and are here to answer your questions. We’d also like to invite everyone to Quantcast’s public issue tracking system, which we use for discussions on the latest QFS features, bugs and questions.

See the full video of our presentation on QFS 1.2 at Amazon re:Invent 2016 below:

Interested in working on QFS and other projects at Quantcast? Check out our open positions here.

Quantcast

Quantcast