Site Reliability Engineer

last updated July 13, 2022 19:48 UTC

HQ: Remote

OFF: Berlin, Berlin, Germany
Full-Time
Full-Stack Programming

more jobs in this category:

POSITION SUMMARY

We are looking for a motivated and talented Site Reliability Engineer to join us from our remote European team to help us monitor, develop, and scale the Cordial platform. Our goal is to provide our clients with a delightful experience in their day to day interaction with the platform and to create trust that the expected jobs and background processes will run without issue. You will work with our DevOps and Product teams to ensure that bugs are squashed, performance is optimized, and blind spots are revealed through comprehensive monitoring. This position is fully remote with no physical Cordial office located in Portugal.

YOU WILL

Utilize your knowledge of Web, App, Network, Server, Storage and Security technologies to administer, monitor and troubleshoot application and network components in our cloud based environment

Actively contribute to Infrastructure Design and Implementation discussions

Provide production support for the Product Development teams

Participate in an on-call rotation

Work with the team to develop and deploy monitoring and alerting architecture, and implement monitoring/logging solutions

Troubleshoot complex issues in a timely manner as necessary to maintain the performance and stability of our Production Application environment

Help build out SLOs and document and monitor SLAs

ABOUT YOU

3+ years UNIX/Linux Systems (Unix/Linux) & Network Administration (DNS, IPsec, VPN, Load Balancing, process tracing)

Experience with AWS (we use EC2, EKS)

Experience with monitoring, logging and alerting tools

Previous positions held as a SRE and/or DevOps role

Software development experience

Experience with Docker/containers & Kubernetes

Comfortable working in a globally distributed team across time zones

Strong teamwork and communication skills

A genuine desire to learn new technologies and grow

Fluent in verbal and written English

BONUS

Experience with MongoDB

Experience deploying and/or maintaining Kubernetes/EKS clusters

Experience with Prometheus/Grafana/Datadog

Experience implementing SLOs, reliability targets, error budgets

Apply info ->

To find out more about this job, please visit this link