Campaign Monitor is seeking a Site Reliability Engineer/ Site Operations Engineerto join our growing SRE team; someone who will work on automating and scaling our systems for ever-increasing growth. We send over 2 billion emails every month and our infrastructure needs to scale accordingly so we can deliver the best user experience possible. If you don’t have previous experience from a SRErole, but you’re a really solid Software Engineer with a touch of DevOps, and you’rekeen to learn, we’re totally cool with that as well.
Who are you?
You’re smart, personable and friendly, and you communicate clearly and respectfully. You live and breathe problem solving related to mission critical services and are passionate about learning challenges and trends within Site Reliability.
What you’ll be doing
Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions – Facilitate root cause analysis sessions and communicate the findings back to the product teams
Own end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence; eventually automate response to all non-exceptional service conditions – Create visibility on how we perform against our SLA through active monitoring and reporting
Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Campaign Monitor’s services.
Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
Conduct periodic on call duties using a follow-the-sun model.
Measure everything, report on interesting events and alert on critical issues.
Create and update documentation.
Work with other teams to build, test and roll out systems.

