Cluster Interfaces is responsible for all of the entry points into Quantcast’s big data services. We develop and operate a SQL-On-MapReduce platform along with a data catalog. Quantcast’s SQL-On-MapReduce platform is very similar to Hive, providing company wide easy access to our core data assets.
The team also develops a dashboard through which data pipelines are scheduled and executed. The dashboard provides a holistic view into the company’s data pipelines, including their dependency trees, capacity needs, and execution times. It also allows users to define data retention policies, monitoring, and alerting.The dashboard is similar to the open source project Azkaban.
Quantcast is looking for a person that can lead and develop our own SQL-On-MapReduce solution. It leverages Facebook’s Presto framework and Quantcast’s custom MapReduce to process ~80 PB per month. In addition to driving our SQL-On-MapReduce language development, you will work on our data catalog similar to HCatalog and our pipeline scheduler (similar to Azkaban) to make SQL-On-MapReduce scripts easy to use and deploy.
Responsibilities
Research and make decisions on the future path of the platform
Design, implement features, as well as optimize and debug SQL-On-MapReduce jobs
Mentor and grow team members
Advise users across multiple teams
Improve SQL-On-MapReduce as both an ad-hoc query platform and alternative language for writing Map/Reduce jobs
Work closely with cluster and operations teams
Work with various languages and really big data
Participate in a light on-call rotation
Contribute not only to the team roadmap, but also to the vision of big data services at Quantcast
