EMS Software is looking for a Cloud Operations Engineer who will aid us in the ongoing transformation of our product offering from an on-premise solution to one having a hybrid offering with a pure SaaS presence.
You will be at the center of a vital growth initiative.
You’ll join a company that serves 2,500 great organizations like Accenture, Deloitte, Goldman Sachs, Harvard and Yale University. Our customers have millions of people using our software to manage events, reserve spaces to meet, work and study; and to analyze and optimize their use of real estate.
We’re looking for an engineer with a development background who has some operational experience and expertise spanning high availability systems in both lower and production environments, a DevOps mentality of continuously improving the system, and a firm grasp on automation and cloud architectures. You must have extensive experience supporting applications developing in at least 3 of the following: . NET, Java, JavaScript, Python, Node, GO or Ruby. You should also be passionate about solving problems and developing creative solutions leveraging automation.
What You’ll Do
Design, provision, configure and maintain the platform operations to handle the scale of running several application stacks in the cloud that will be consumed worldwide
Automate the deployment and maintenance of cloud platform technologies
Oversee production operations, log management, data warehouse, and database operations, including management of Splunk services
Ensure all monitoring systems (IT, development, service management, Apdex) are in place
Enforce consistency of monitoring, reporting, and alarming systems
Help drive process improvements for service management, including: outage/incident management, rollbacks and reporting
Research emerging virtualization techniques and advise management
Perform capacity management, load and scalability planning
Ensure compliance with deployment and operations documentation
Assist management in development and optimization of operational cost models
Design cloud infrastructure for high reliability and availability
Build strategic and tactical plans for continued improvement of cloud architecture and operations
Assist in the establishment of 24×7 performance monitoring and response protocols
Provide on-call support outside of normal work hours/days
About You
You’re driven, humble, and autonomous
You’re a quick study, a strong communicator, and you’re able to adapt to a fast-paced environment
You have a working knowledge of Agile Development practices (e.g., SCRUM, TDD)
You are or have the mindset of a developer, but are intrigued by the operational aspects of hosting developed solutions
You are devoted to automation
You’re an expert in Windows (IIS, SQL Server) and Linux
You have at least 1 years of hands-on production experience with Amazon Web Services (AWS), Google Cloud or Microsoft Azure. This includes:
Configuration of VPCs, with VPN to corporate network
Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments
At least 6 months of experience in a professional production environment
At least 6 months of experience managing networking infrastructure and monitoring at the application level
Performance optimization experience, including: troubleshooting and resolving network and server latency issues; performing hardware evaluation/selection tasks; performance vs cost vs time analysis
At least 1 year of experience with automation or scripting tools (e.g., GO, Python, Shell, PowerShell)
At least 6 months of experience with Ansible, Jenkins
You’re detail-oriented, with excellent documentation skills, and you’re someone who can successfully manage multiple priorities
Troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure
Other Things We Hope You Have
Bachelors in Computer Science or equivalent work experience
Experience with Mongo, MS SQL Server, Splunk, Grafana, Terraform and Prometheus
Experience working with Docker, Kubernetes and GO Hands-on experience with performance, load and security penetration testing
Hands-on experience with building out and maintaining a continuous integration and delivery pipeline
The Team
You will be part of a 6-person team of 4 Operational Engineers, a Director of Cloud Operations, and a Technical Product Owner.
The larger team consists of 13 Developers, 10 Quality Engineers, 4 Product Owners, and 3 UX Designers. We have an open and collaborative environment where everyone works together to deliver what is needed, from product features to operations needs (e.g., health checks).
We value open and direct communication, taking calculated risks that will push us forward, and investing in our people.
Our Stack
We have current Production and Continuous Integration footprints in Google Cloud (primary), AWS, and Azure
Our front-end applications leverage React and React Native, Redux, Node, C#, and Knockout
Our APIs comprises of Golang, . NET and . NET core
Our backend comprises of MS SQL Server
We have a well built out CI pipeline that allows us to deploy and stand up customers on demand
We leverage Ansible heavily, Splunk (JSON Logs) is our blood line and we enjoy operational efficiency and accessibility through Hubot and StackStorm
