As a member of the newly-formed Cloud/DevOps group, you will help continue to define our transformation towards an enterprise SaaS solution, hosting numerous top-tier customers. We’re looking for an operations engineer with a development background who has operational experience and expertise spanning high-availability systems in both lower and production environments, a DevOps mentality of continuously improving the system, and a firm grasp on automation and cloud architectures. You must have extensive experience supporting applications developing in . NET, Java, and JavaScript. You should also be passionate about solving problems and developing creative solutions leveraging automation.
Your First Three Months
In your first month, as your familiarity with the product grows, your responsibilities and influence will grow as well. You, along with your team, will be responsible for supporting the product team’s operational needs in our upper environments. Further, you will collaborate with other members of the Development and QA team in established patterns and continue to hone your skills as you start to formulate ways to push the design, architecture and implementation of our CI pipelines (lower and upper environments) to their next phase.
Within two months, you will fill in the gaps to have a well-tested, low-latency and highly available environment for our product operational needs. Working with the development team, you will start to implement out the gaps in creating and supporting a truly scalable product offering. You will be highly influential in the formation of the rest of our operations team as you help hire our next operations engineer. Your team will be responsible for supporting production environments.
Within three months, you will help drive changes to the operational and development roadmap as we continue onboarding new and existing customers into our hosted production environments in2020.
What You’ll Do
Design, provision, configure and maintain the operations platform to handle the scale of running several application stacks in the cloud that will be consumed by thousands of customers nationwide and our internal Product Team. Responsibilities will include:
Automating the deployment and maintenance of cloud platform technologies in both upper and lower environments
Implementing and overseeing log management, data warehouse, and database operations, including management of Logging/Audit services
Ensuring all monitoring systems (infrastructure- and application-level) are in place; report on availability
Designing and implementing strategies around disaster recovery and security for all sub-systems in infrastructure (e.g., web servers, database, queues, storage, network)
Aiding in improving the overall product through development task specific automation in the lower pipeline
Integrate static analysis tools in build pipeline (security, code quality, etc.)
Add database deployment capability to release pipeline (automate schema changes across all databases)
Incorporate test automation into build pipeline
Separate code from configuration in build/release pipeline
Researching and implementing emerging virtualization techniques and advising management around improved scalability
Building strategic and tactical plans for continued improvement of cloud architecture and operations
Performing capacity management, load and scalability planning
Helping drive process improvements for service management, including: outage/incident management, rollbacks and reporting
Assisting management in development and optimization of operational cost models
Assisting in the establishment of 24×7 performance monitoring, reporting and response protocols
With the help of your team and the development group, you will provide on-call support outside of normal work hours/days
About You
You’re driven, humble, and autonomous
You’re a quick study, a strong communicator, and you’re able to adapt to fast-paced environments
You have a working knowledge of Agile Development practices (e.g., SCRUM, TDD)
You are (or have the mindset of) a developer, but are intrigued by the operational aspects of hosting developed solutions
You’re devoted to automation
You have 3-5 years of hands-on production experience with Amazon Web Services (AWS), Google Cloud or Microsoft Azure, including:
Configuration of VPCs, with VPN to corporate network
Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments
2+ years of experience in a professional production environment
2+ years of experience managing networking infrastructure and monitoring at the application level
Performance optimization experience, including troubleshooting and resolving network and server latency issues, performing hardware evaluation/selection tasks, performance vs. cost vs. time analysis
At least 1 year of experience with automation or scripting tools (e.g., GO, Python, Shell, PowerShell)
2+ years of experience with Ansible, Jenkins or other comparable tools
You’re detail-oriented, with excellent documentation skills, and you’re someone who can successfully manage multiple priorities
Troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure
Other Things We Hope You Have
Bachelor’s Degree in Computer Science or equivalent work experience
Experience with Relational Databases such as Oracle and Aurora, Splunk (or other log aggregation tools), Grafana, Terraform and Prometheus
Extensive production experience with MS Azure
Experience working with Docker, Kubernetes and hands-on experience with performance, load and security penetration testing
Hands-on experience with building out and maintaining a continuous integration and delivery pipeline
Our Team
You will be a member of what will ultimately be a three-person team of Cloud Ops Engineers. You will report directly to our VP of Cloud Ops & Security, but will collaborate extensively with our Principal Cloud Ops Engineer, Director of Development and the rest of our Development Team.
We have an open and collaborative environment where everyone works together to deliver what is needed, from product features to operations needs (e.g., health checks).
We value open and direct communication, taking calculated risks that will push us forward, and investing in our people.
Our Stack
We have current Production and Continuous Integration footprints in Azure and AWS
Our front-end applications leverage . Net, Vue.js and Java
Our APIs comprise of . NET and java
Our backend comprises of MS SQL Server, Oracle and AWS Aurora
We currently have a CI pipeline that we are looking to take to the next level to help with our growth in customers and employee base
