Platform.sh is a groundbreaking hosting and development tool for web applications. We’re a European VC-Backed startup with a host of blue-chip Enterprise clients and a string of awards and grants (including €2m from the EU Horizon 2020 program).
To reinforce our technical prowess, we are looking to grow our operations team. If you’re looking for an exciting, high-growth opportunity with an award-winning, cutting-edge company, this could be just the job for you
For its PaaS solution, https://platform.sh is looking for an Operations and Service Reliability Engineer with a taste for Python and Go, great Linux system understanding, and a real hunger for the challenges of building robust, distributed systems.
Platform.sh is a PaaS shrouded in a lot of black magic (we can consistently clone a whole running cluster, with its state, databases, indexes in a matter of seconds). We want to get this down to the hundreds of milliseconds domain. Interested? There is more
Our external API is pure Hypermedia REST + oAuth on top of Pyramid. It mechanizes the Git layer and needs more features.
We can consistently generate from the same manifest a Docker container, an LXC one, or VM disk images (AWS, Azure, OpenStack), we want more targets.
We probably have the highest industry container density. We need to get it higher.
We support any Python, Ruby, NodeJS or PHP, Java and . NET, time to roll-out Elixir, of course, Elixir (and Rust. We need Rust).
We need to have more auto-healing on the high-availability clusters. We need more performance out of our multi-protocol ssh proxy. We need work on our Ceph Implementation. We need to get the Debian package generation streamlined and faster. We need… great ideas on how to make Platform.sh even better.
Directly reporting to our VP of Infrastructure and in close interaction with our Engineering and Customer Support teams, you will be responsible for:
cloud operations: configure clusters, deploy stuff, follow-up on alerts, help customer support debug issues.
creating systems, tools & processes that will enhance our support and operations efficiency
improving service quality, discipline and reliability throughout lifecycle
monitoring operating objectives, streamline and automate intervention
continuous learning from Operations experience, modeled as software
supporting our data protection officer and compliance team with information requests, pen testing, disaster recovery, and related activities
executing our security incident management process
work with appropriate teams to deploy and operate security tools and solutions
ensure all systems, security applications and services in environment are securely configured and managed through operating system appropriate security platforms and tools
ensure optimal operation of all security solutions and tools
automating all of the above so they can instead drink margaritas (or non-alcoholic beverages, of course)
Must have :
The ideal candidate:
has proven successful experience in an operations role
has demonstrated the ability to successfully manage cloud-based infrastructure for a fast growing organization
has experience with containerization technologies
has had exposure to cloud services (AWS)
understands how an OS works, knows networking, how git works, and the constraints of a distributed system
puppet experience
is proficient in Python (Golang a plus)
has an understanding of
Patch and Vulnerability Management process
Principle of Least Privilege
Incident response
Identity and Access Management
IPTABLES
WAFs
Nice to have :
knowledge of Magento Ecommerce, Symfony, Drupal, eZ Platform, or Typo3
relational database skills
public speaking experience
ability to kick ass in Chess or beat Zork without using a map
proficiency in Rust grants you bonus points
CISSP (preferred), Security+, GCED, GICSP, GCIH, SSCP, or CASP Certification or similar
audit and compliance experience
can bravely take on new challenges like a Gryffindor, analyzes problems like Ravenclaw, protects our infrastructure and client data like a Slytherin, and talks with clients like a Hufflepuff.
Note: We don’t like stress, so we build everything to be robust and resilient, but stuff does break. This is a role with on-call duties. If page-duty fills you with dread well, this might not be a fit.

