At Elastic, we have a simple goal: to solve the world’s data problems with products that delight and inspire. As the company behind the popular open source projects — Elasticsearch, Kibana, Logstash, and Beats — we help people around the world do great things with their data. From stock quotes to real time Twitter streams, Apache logs to WordPress blogs, our products are extending what’s possible with data, delivering on the promise that good things come from connecting the dots. The Elastic family unites employees across 30+ countries into one coherent team, while the broader community spans across over 100 countries.
Thanks to our ongoing expansion we have the opportunity to grow our Site Reliability team. We’re a part of the Elastic Cloud engineering team with a focus on solving Cloud operations problems and keeping the SaaS online, who aren’t afraid to get our hands dirty. We are the first line of consumers for Elastic’s products and our experience helps influence the direction of the stack. While most organizations may have a single or a handful of Elastic Stack deployments, here you’ll be responsible for identifying, troubleshooting and reporting platform problems to product engineers (or fixing the code yourself) in order to ensure that the thousands of Elasticsearch clusters we manage are providing a stable and reliable service. We’re looking for people who are just as passionate about troubleshooting issues with distributed systems as they are to automate, code and collaborate to solve problems.
Responsibilities
You will report and solve problems within the Elastic Cloud infrastructure services and collaborate on issues with product engineers
You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes
You will monitor the Elastic Cloud platform and Cloud infrastructure, responding to incidents, correcting and improving systems to prevent incidents and planning capacity
You will manage Cloud provider infrastructure, system deployments and product releases
You will be involved in resolving Elastic Cloud customer support issues
You will demonstrate and promote best practices for teams using Cloud platforms
You will participate in 24×365 on-call schedules
Experience
You are either an experienced sysadmin with professional skills in Linux, preferably on distributed systems at scale, and a demonstrable interest in using software engineering to solve operational problems; or a software engineer with real interest, and ideally some experience, in Linux systems, networking, monitoring and automation.
You have at least three years of experience using a public Cloud; AWS, GCP, Azure, Softlayer or OpenStack
You are comfortable writing software to automate API-driven tasks at scale. SRE use Python and Go regularly but are also encouraged to contribute to the product codebase in Java, Scala, and Python.
You have used Ansible, Puppet, Chef or another config management suite, know where it’s broken, and open to trying new alternatives
Key Skills
Healthy knowledge of Linux (have compiled your own kernel at some point, know how to trace syscalls, understand TCP, care about the difference between sysvinit/runit/systemd, etc.)
Relentless desire to automate and build software tools
Desire to represent work in git, driven by a GitHub workflow through issues and pull requests
Love open source development, and have contributed to some project somewhere (doesn’t have to be ours), whether through mailing lists, patches, documentation, etc.
Enjoy working remotely and the communication it requires
Love a diverse environment, working with men and women all over the world
Additional Information
Competitive pay and benefits
Stock options
Catered lunches, snacks, and beverages in most offices
An environment in which you can balance great work with a great life
Passionate people building great products
Employees with a wide variety of interests
Distributed-first company with employees in over 30 countries, spread across 18 time zones, and speaking over 30 languages!

