Scrapinghub is looking for a senior systems engineer to join the team as Head of Sysadmin. This role will be responsible for the successful operations and scaling of the infrastructure and software that powers crawls of over 2 billion pages a month.
Our infrastructure stack includes Ubuntu, Python, Django, MySQL, HBase, Docker, LXC, AWS, along with our own technologies, such as Scrapy, Crawlera and Hubstorage.
Founded by the creators of Scrapy, Scrapinghub helps companies turn web content into useful data with a cloud-based web crawling platform, off-the-shelf datasets, and turn-key web scraping services.
Join us in making the world a better place for web crawler developers and data scientists with top talented engineers working remotely from more than 30 countries.
Your key responsibilities will be to:
Oversee design, deployment and management of our global infrastructure
Help identify, debug and fix problems arising on Scrapinghub’s platform, leveraging the work with both sysadmin and platform team members
Organize the sysadmin team’s work and delegate tasks according to members skillset
Help new members onboarding (by writing guides and direct mentoring)
Write tools and scripts to provide automation and self service solutions for ourselves and other teams
Design new systems to support production services
Creatively solve scale challenges regarding a rapidly expanding cloud environment
Help improve monitoring and identify key performance metrics
Proactive R&D – discovering and implementing new tools, emerging technology, etc.
Disaster recovery design, implementation, and maintenance
Troubleshooting and resolution of server/network issues
A few examples of things you’ll do:
Migration of Cloudera Distribution for Hadoop (CDH) from version 4 to version 5 and the 50+ TB of data stored inside it, with minimal downtime
Building and optimizing a Elasticsearch+Logstash+Kibana stack for our development team to monitor and analyze production system usage
Design and implement a continuous integration and deployment system based on Docker, Mesos and an automatically configured http load balancer able to reroute traffic in case application containers die
Automate servers setup to scale to +300 servers on cloud providers and bare metal, be ready to replace hardware at any time without service outage
Setup and optimize a high available multi master MysqlDB and RabbitMQ cluster

