PwC India
Website:
pwc.in
Job details:
Experience : 5 - 12 yrs
Role Overview
As a Lead SRE, you will be the bridge between software engineering and systems operations. You will apply an engineering mindset to system administration, focusing on optimizing high-availability Azure environments through automation. You will be responsible for the "Error Budget" and ensuring that our SLOs (Service Level Objectives) are met through robust IaC and CI/CD practices.
Key Responsibilities
- Reliability Engineering: Design and implement self-healing infrastructure on Azure to ensure 99.9%+ uptime for mission-critical services.
- Infrastructure as Code (IaC): Lead the development of standardized, reusable Terraform modules to ensure consistent environment provisioning and prevent configuration drift.
- Platform Orchestration: Manage and optimize Azure Kubernetes Service (AKS) clusters, using Helm to manage complex application lifecycles and deployments.
- Automation & CI/CD: Architect end-to-end delivery pipelines using Azure DevOps (YAML & Classic) and GitHub Actions, prioritizing "Security-as-Code" and automated testing.
- Incident Response & Post-mortems: Lead the "On-Call" rotation strategy and conduct blameless post-mortems to identify root causes and prevent recurrence.
- Observability: Implement comprehensive monitoring, logging, and alerting (using Azure Monitor, Log Analytics, or Prometheus/Grafana) to track SLIs (Service Level Indicators).
Click on Apply to know more.