Scrapinghub is looking for a senior systems engineer to join the team. This role will be responsible for the successful operations and scaling of the infrastructure and software that powers crawls of over 2 billion pages a month.
Founded by the creators of Scrapy, Scrapinghub helps companies turn web content into useful data with a cloud-based web crawling platform, off-the-shelf datasets, and turn-key web scraping services.
Join us in making the world a better place for web crawlers with top talented engineers working remotely from over 30 countries.
Your key responsibilities will be to:
Write tools and scripts to provide automation and self service solutions for ourselves and other teams
Design new systems to support production services
Creatively solve scale challenges regarding a rapidly expanding cloud environment
Help improve monitoring and identify key performance metrics
Proactive R&D – discovering and implementing new tools, emerging technology, etc.
Disaster recovery design, implementation, and maintenance
Troubleshooting and resolution of server/network issues
A few examples of things you’ll do:
Migration of Cloudera Distribution for Hadoop (CDH) from version 4 to version 5 and the 50+ TB of data stored inside it, with minimal downtime
Building and optimizing a Elasticsearch+Logstash+Kibana stack for our development team to monitor and analyze production system usage.
Design and implement a continuous integration and deployment system based on Docker, Mesos and an automatically configured http load balancer able to reroute traffic in case application containers die.
Automate servers setup to scale to +300 servers on cloud providers and bare metal, be ready to replace hardware at any time without service outage.
Setup and optimize a high available multi master MysqlDB and RabbitMQ cluster

