Senior Site Reliability Engineer – Data (REMOTE)

last updated April 24, 2026 12:36 UTC

HQ: Beaverton, Oregon

Full-Time
DevOps and Sysadmin

more jobs in this category:

The Discogs Platform team is focused on several objectives: building and supporting performant, cost-effective, reliable infrastructure; developer experience tooling and mentorship; and creating "golden paths" for organization-wide standards and velocity. As a key member of the Platform team, the Senior Site Reliability Engineer – Data will be working closely with other Discogs engineering squads to develop and optimize scalable, well-planned relational database architectures, drive best practices and stability for our use of Kafka and change data capture, and contribute to the Platform team’s operations.

Location

This is a remote position. Open to candidates located in OR, WA, CA, CO, TX, IL

Compensation

Starting Base Salary Range: $130,000 – $140,000 yearly

What You’ll Accomplish

Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

Stewarding Discogs’ data stores as a key subject matter expert
Leading efforts on the reliability and design patterns of our Kafka and Kafka Connect implementations
Establishing data contracts and clear communication standards between CDC producers and consumers
Working closely with engineering squads to refactor and re-architect MySQL database schema and indexing for long-term scalability, performance, and cost effectiveness
Mentoring engineering squads on Platform best practices for MySQL, Kafka, and other software development lifecycle areas
Writing documentation and runbooks that contribute to the engineering organization’s knowledge base
Working in a containerized, orchestrated environment
Contributing to the Platform team’s disciplines of site reliability and operations, supporting both our squads and Platform’s central infrastructure

Participating in on-call rotation, responding to incidents, and troubleshooting data and other operations issues

What You’ll Contribute

Minimum Education and Experience

A Bachelor’s Degree in Computer Science or similar area of focus, or equivalent relevant work experience.
5+ years of experience working with Kafka and relational database management systems (RDBMS).
6+ years experience in Ops, DevOps, Site Reliability, Platform or other systems roles.

Required Skills & Abilities:

Relational database schema design, query performance optimization, administration (MySQL, Percona Server, AWS RDS)
Kafka: Cluster administration (Strimzi), Kafka Connect (Debezium, JDBC)
CI/CD (GitHub Actions)
GitOps (ArgoCD)
Kubernetes (EKS, Kustomize, Karpenter, administration, application manifests)
AWS and cloud development (VPC, EKS, RDS, S3)
Observability (Datadog, Sentry)
Scripting (Shell, Python)
Track record of collaboration and mentorship
Excellent written communication and documentation skills
Continuous learning
Ownership and proactive approach to solving large problems

Preferred:

Infrastructure-as-code (Terraform)
Elasticsearch (ECK administration, scaling, performance)
Python (SQLAlchemy, FastAPI)
GraphQL (schema design, Apollo federation)
REST API
Hashicorp Vault
Redis
Memcached
NoSQL Database
Data Lake/Warehouse
Data Governance
Data Security

The Platform team covers a wide range of technical topics and we’d love to hear about your skills beyond this list!

Apply info ->

To apply for this job, please visit the application page