Site stats you will improve:
728+ Nvidia P100/T4 GPUs
32k+ physical cores over 24 carrier hotels and 6Tbps capacity
10k+ concurrent live video broadcasts
400k+ concurrent live video streams
26B+ weekly web requests
95% of web requests completed in 59ms-72ms
2M database queries per minute, average response 3.5ms
300k+ cmd/sec Redis Clusters
What you will do:
Performance analysis to identify sources of instability using data from APM and distributed telemetry data tools
Analyze complex systems to identify operational surprises and minimize downtime.
Software engineering and patching in to incrementally improve performance, scalability, and reliability
Infrastructure modifications in both a data center metal environment with advanced routing/switching and in the public cloud
Predictive failure analysis and disaster planning
Author new tools and automation to streamline the devops pipeline
Collaborate with Frontend/Backend engineering, QA, DevSecOps, and Data teams
Database and kv store administration and configuration with a focus on uptime and performance
Incident response and postmortem reports
What you bring:
STEM degree and relevant experience as a Site Reliability Engineer
Exceptional problem solving skills
High proficiency in one of the following: C, C++, Java, Python, Go, etc.
High proficiency in Unix/Linux environment, excellent knowledge of internals (e.g., filesystems, system calls)
Networking knowledge (e.g., routing, switching, TCP stack) for both metal and cloud (VPC, Security Groups) environments
Experience in database administration and configuration.
Experience with DevOps tools such as Ansible, Docker, Kubernetes,
On call reporting to monitoring and alerting of core website functions as needed
Experience in growing data center teams (nice to have)
What will you receive:
A strong team of A-players
A robust engineering culture
Opportunity to make an impact on the highly popular product
Freedom to bring the ideas to the table and to make technical decisions
Support and guidance of the highly professional and knowledgeable team
Flexible working environment
Recruiting Process
We value the sense of urgency and aspire to build a smooth and transparent recruiting process. These are our stages in the recruiting process:
Phone screen with a recruiter
Interview with CTO and Director of IT
Team interview
We reserve the right to add additional selection stages to the process depending on the specific skills of each candidate.
Perks & Benefits:
Health & Life insurance with dental and vision plan. 100% Employer sponsored for employee & dependents
401k matching
Paid holidays, vacation and sick days
Corporate Udemy account and professional development assistance
