Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter, more profitable business decisions to improve operating margins.
Why join Coupa?
🔹 Pioneering Technology: At Coupa, we’re at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
🔹 Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
🔹 Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other.
Learn more on Life at Coupa blog and hear from our employees about their experiences working at Coupa.
The Impact of a Sr. Director, Cloud Infrastructure & Platform Quality at Coupa:
We are seeking a Sr. Director, Cloud Infrastructure & Platform Quality to lead and evolve our cloud-scale quality and resilience strategy across infrastructure, platform services, and production operations. This role is responsible for scaling and running a world-class testing organization focused on reliability, scalability, performance, and fault tolerance across complex, distributed cloud environments.
You will partner closely with Engineering, Cloud Operations, SRE, Security, and Product teams to ensure our platforms are resilient by design and continuously validated under real-world conditions. This leader will bring deep expertise in cloud providers, orchestration platforms, chaos engineering, and large-scale system validation, combined with proven experience running and scaling high-performing teams
What You’ll Do:
- Cloud & Infrastructure Quality Strategy
- Define and own the end-to-end testing strategy for cloud infrastructure, platform services, and production environments.
- Establish standards and best practices for infrastructure testing, resilience validation, and operational readiness.
- Drive a shift-left and shift-right quality model, embedding testing throughout the lifecycle, from design through production.
- Chaos Engineering & Resilience
- Lead the design and execution of chaos testing and fault-injection programs to validate system behavior under failure scenarios.
- Partner with SRE and Cloud Ops teams to continuously test availability, failover, disaster recovery, and incident response.
- Use real incident learnings to improve test coverage and prevention mechanisms.
- Cloud Platforms & Orchestration
- Bring deep understanding of major cloud providers (AWS, Azure, GCP) and their managed services.
- Oversee testing strategies for containerized and orchestrated environments (Kubernetes, service meshes, distributed systems).
- Validate scalability, performance, and reliability of multi-tier, multi-region architectures.
- Cross-Functional Leadership
- Act as a trusted partner to Engineering, Cloud Operations, SRE, Security, and Product leaders.
- Influence architecture and design decisions with a quality and reliability mindset.
- Drive alignment across teams on reliability goals, SLAs, and operational metrics.
- Team Leadership & Growth
- Lead, and scale a high-impact infrastructure and cloud testing organization.
- Coach and develop senior managers and technical leaders.
- Foster a culture of ownership, innovation, and continuous learning within the team.
- Metrics & Continuous Improvement
- Define and track key metrics around system reliability, incident trends, test effectiveness, and operational risk reduction.
- Leverage automation, AI, and analytics to continuously improve test efficiency and signal quality.
- Drive measurable improvements in platform stability and customer experience.
What You Will Bring to Coupa:
- 12+ years of experience in engineering, cloud infrastructure, testing, or reliability engineering, with significant leadership experience.
- Proven track record leading large, distributed technical teams.
- Deep hands-on knowledge of cloud platforms (AWS, Azure, GCP) and modern infrastructure architectures.
- Strong experience with distributed systems, orchestration frameworks, and container platforms.
- Demonstrated expertise in chaos engineering, resilience testing, and large-scale system validation.
- Experience partnering with Engineering, SRE, and Cloud Operations teams at scale.
- Excellent communication skills and ability to influence at executive and senior-leader levels.
- Preferred Qualifications
- Experience in SaaS, cloud-native, or large-scale enterprise platforms.
- Familiarity with observability, monitoring, and incident management tooling.
- Exposure to AI-driven testing, automation, or reliability analytics.
- Experience operating in high-availability, mission-critical environments
Coupa complies with relevant laws and regulations regarding equal opportunity and offers a welcoming and inclusive work environment. Decisions related to hiring, compensation, training, or evaluating performance are made fairly, and we provide equal employment opportunities to all qualified candidates and employees.
Please be advised that inquiries or resumes from recruiters will not be accepted.
By submitting your application, you acknowledge that you have read Coupa’s Privacy Policy and understand that Coupa receives/collects your application, including your personal data, for the purposes of managing Coupa’s ongoing recruitment and placement activities, including for employment purposes in the event of a successful application and for notification of future job opportunities if you did not succeed the first time. You will find more details about how your application is processed, the purposes of processing, and how long we retain your application in our Privacy Policy.
Apply info ->
To apply for this job, please visit jobs.lever.co

