AHEAD

HQ: Hybrid

more jobs in this category:

  • -> Mentor - Cyber Security Career Track (Part-time/Remote) @ Springboard
  • -> Microsoft SQL Server Database Administrator DBA @ red9.com
  • -> MSSQL Database Administrator @ Paymentology
  • -> Senior Developer - Integrations Team (C#/.NET) @ Deel
  • -> Technical web manager (Remote, Europe or Asia Pacific) @ Creative Force
As a Senior Engineer (L3) specializing in Defect Management & DevOps, you will play a critical role in driving operational excellence, ensuring defect-free delivery pipelines, and strengthening reliability across cloud-native platforms. You will collaborate closely with engineering, QA, SRE, and product teams to manage end-to-end defect processes, streamline automation, and enhance service observability. The role demands deep analytical capability, strong DevOps experience, and the ability to influence cross-functional improvements through data-driven insights and advanced troubleshooting.
You will act as a subject matter expert (SME) in DevOps and GCP/AWS, overseeing end-to-end release processes, governance, and delivery pipelines. This role requires leadership, deep technical knowledge, and excellent communication skills.
Core Responsibilities

    • Serve as the Subject Matter Expert (SME) for cloud platforms, primarily AWS (GCP exposure is a plus), providing guidance on cloud best practices, architectural decisions, and solution design.
    • Support customers with core Managed Services technologies, including Cloud, Automation, Terraform, CI/CD, and containerization.
    • Design, implement, and optimize cloud-native and DevOps solutions aligned with customer and organizational objectives.
    • Lead technical discussions, demos, and customer engagements while effectively communicating complex technical concepts to both technical and non-technical stakeholders.
    • Assist with team-building activities such as interviewing, onboarding, and aligning technical resources.
    • Provide technical leadership, coaching, and mentorship to junior team members.
    • Maintain strong project and situational awareness to ensure deliverables meet timelines and organizational expectations.
    • Develop high-quality documentation including architectures, workflows, runbooks, and other written deliverables.
    • Act as a technical expert in internal knowledge-sharing initiatives and external client interactions.
    • Influence cloud governance, operational policies, best practices, and process improvements across teams and customer environments.
    • Ensure precision, accuracy, and strong attention to detail across all tasks and deliverables.
Requirements

    • Act as the SME for Defect Management processes, governance, tooling, and reporting.
    • Own and manage the full defect lifecycle, including logging, triage, prioritization, RCA, corrective actions, and closure.
    • Partner with Development, QA, SRE, and Product teams to ensure timely resolution of high-impact issues.
    • Establish and maintain defect dashboards, KPIs, and trend analytics to drive quality and process improvements.
    • Develop standardized runbooks, escalation workflows, and operational procedures for defect handling.
    • Lead cross-team Root Cause Analysis (RCA) investigations and drive Corrective and Preventive Actions (CAPA) implementations.
    • Improve operational readiness through enhanced monitoring, alerting, and structured incident-to-defect workflows.
    • Provide guidance on CI/CD optimization, automation strategies, infrastructure stability, and reliability engineering.
    • Mentor junior engineers in DevOps principles, tooling, defect analysis techniques, and troubleshooting best practices.
Requirements

    • Defect Management Expertise
    • Full ownership of defect lifecycle ensuring SLA adherence.
    • Deep understanding of SDLC, change management, and ITIL best practices.
    • Ability to analyze defect patterns, severity trends, root causes, and long-term systemic issues.
    • Conduct structured RCA using 5 Why’s, Fishbone, Fault Tree Analysis.
    • Define and enforce severity, categorization, and prioritization standards.
    • Create dashboards and quality metrics to drive continuous improvement.
    • Tools & Skills:
    • Strong JIRA workflow, automation rule, dashboard, and reporting expertise.
    • Ability to visualize defect trends and quality metrics effectively.
    • Observability, Monitoring & SIEM Tools
    • Hands-on experience with Dynatrace, Datadog, Prometheus, Grafana, CloudWatch, or similar tooling.
    • Skilled in APM analysis, log correlation, anomaly detection, service mapping, and performance troubleshooting.
    • Build and maintain dashboards and alert frameworks.
    • Integrate monitoring insights with DevOps and operational workflows.
    • Exposure to SIEM event analysis for operational and security correlation.
Core DevOps Responsibilities

    • Build, enhance, and support CI/CD pipelines across multiple environments using AWS CodePipeline, CodeBuild, CodeDeploy, and Git-based workflows.
    • Collaborate on automation initiatives using Terraform, CloudFormation, AWS CDK, or equivalent IaC tools to standardize and streamline deployments.
    • Deploy and manage AWS cloud-native services including EKS, ECS, Lambda, API Gateway, S3, IAM, and supporting architectures.
    • Work with containers and orchestration platforms such as Kubernetes, EKS, ECS, and AKS (where required).
    • Implement deployment best practices such as blue/green, rolling updates, and automated rollback strategies to ensure safe, repeatable releases.
    • Troubleshoot complex deployment issues, environment drift, infrastructure failures, performance bottlenecks, and service-level degradations.
    • Implement and maintain observability using CloudWatch, Prometheus, Grafana, Datadog, Dynatrace, or equivalent monitoring stacks.
    • Ensure AWS workloads adhere to resiliency, compliance, security, and operational excellence guidelines.
    • Strong hands-on, production-grade DevOps experience in AWS (primary cloud).
    • Deep expertise in Kubernetes, containerized workloads, microservices, autoscaling, and cloud networking.
    • Advanced troubleshooting across AWS services, distributed systems, CI/CD pipelines, and API-driven workflows.
    • Knowledge of AWS cost optimization, tagging, FinOps alignment, and resource lifecycle governance.
    • Exposure to building or maintaining CI/CD pipelines within GCP ecosystems (Cloud Build, GKE, Artifact Registry, etc.).
    • Ability to work with GCP cloud-native services where required, ensuring consistency across hybrid/multi-cloud deployments.
    • Familiarity with GCP IAM, VPC architecture, and core compute/storage/networking components is a plus.
General Qualifications

    • Strong communication, leadership, and mentoring capabilities
    • 6–10+ years of experience in DevOps, SRE, QA Engineering, or Cloud Operations.
    • Expert-level AWS knowledge (GCP exposure would be a plus).
    • Strong command of IaC tools such as Terraform, CloudFormation, CDK.
    • Experience with CI/CD systems: Jenkins, GitLab CI, AWS CodePipeline.
    • Proficiency with Docker, Kubernetes, and container orchestration.
    • Experience with monitoring technologies: Datadog, Grafana, Prometheus.
    • Experience with JIRA workflows and project tracking.
    • Ability to excel in dynamic, fast-paced environments.
Expectations

    • Demonstrate deep expertise across DevOps, cloud platforms, automation, and engineering practices.
    • Balance hands-on delivery with leadership responsibilities and strategic initiatives.
    • Continuously assess, refine, and enhance processes, documentation, and operational workflows.
    • Adapt effectively to evolving customer requirements, project priorities, and technology landscapes.
    • Engage confidently with senior stakeholders, providing clear communication and technical guidance.
    • Lead scoping, planning, and methodology definition for major technical initiatives and transformations.
    • Contribute to the development of new engineering standards, frameworks, and best practices across teams.
    • Take senior-level ownership of critical defects, escalations, and operational issues, driving them to resolution.
    • Influence and drive cross-team improvements in tooling, quality, automation, and operational efficiency.
    • Ensure prevention mechanisms, automation guardrails, and reliability practices are embedded early in delivery cycles.
    • Lead initiatives focused on defect prevention, observability enhancements, and overall DevOps maturity uplift.
    • Participate in on-call rotations and provide Tier-3 technical expertise for complex issues.
    • Continuously propose, design, and implement enhancements across tooling, automation, and operational frameworks.
Apply info ->

To apply for this job, please visit jobs.lever.co

Shopping Cart
There are no products in the cart!
Total
 0.00
0