SRE Program Manager

Accenture

Location: Pune City, Maharashtra, India
Job type: Full-time

Required skills

Agile
compliance
Confluence
cross-functional
Datadog
DevOps
Jira
Kubernetes
microservices
operational metrics
PMP
project management
SRE
uptime

About the role

Accenture

Website: accenture.com
Job details:
Program Manager – Site Reliability Engineering (Cloud Native Platform Team)

Role Summary

The Program Manager will drive day-to-day operations of the Site Reliability Engineering (SRE) team, ensuring alignment with organizational goals for reliability, scalability, and operational excellence. This role requires a strong technical background in SRE practices and proven program management expertise to drive cross-functional initiatives, optimize processes, and deliver measurable business and operations value.

Key Responsibilities

Operational Leadership
Drive adoption of SRE best practices such as error budgets, SLIs/SLOs, and automation to reduce toil.
Ensure compliance with security, privacy, and regulatory standards in all reliability initiatives.
Program Management
Define program scope, objectives, and success criteria for reliability initiatives.
Develop and maintain quarterly roadmaps for SRE projects in collaboration with platform engineering teams.
Track progress, risks, and dependencies across multiple projects using tools like JIRA and Confluence.
Facilitate communication between SRE, development, and leadership teams to ensure transparency and alignment.
Performance Measurement
Establish and monitor KPIs for reliability and operational efficiency.
Prepare executive dashboards and reports to translate technical metrics into business impact narratives.
Lead continuous improvement initiatives based on data-driven insights.
Stakeholder Engagement
Act as the primary liaison between SRE and other teams (Product, Engineering and Delivery-SOC).
Influence decision-making at all levels through clear communication and structured reporting.

Performance Measurement Parameters

Incident Metrics:

Mean Time to Detect (MTTD)
Mean Time to Respond (MTTR)
Mean Time to Recovery (MTTR)
Incident Frequency and Severity

Change Management:

Change Failure Rate
Change Success Rate

Reliability Metrics:

System Uptime / Availability
Service Level Objective Achievement Percentage

Operational Efficiency:

Automation Rate
On-call Burden Reduction

Measurement Matrix for Leadership Presentation

Use a dashboard approach combining:

Latency, Traffic, Errors, Saturation.

Monthly/Quarterly trends on SLO

Incident Heatmaps: Highlighting root causes and resolution times.

Business Impact Metrics: Cost savings, risk reduction, and ROI from reliability improvements.

Tools: Datadog

Experience Requirements

Technical Background:

Prior hands-on experience as a Site Reliability Engineer or in DevOps roles.
Strong understanding of cloud-native architectures (Kubernetes, microservices, distributed systems).

Program Management Expertise:

5+ years in program or technical project management.
Proven ability to manage cross-functional initiatives in fast-paced environments.
Familiarity with Agile methodologies and tools (JIRA, Confluence).

Leadership & Communication:

Experience presenting technical and operational metrics to executive leadership.
Strong stakeholder management and negotiation skills.

Certifications: PMP, SAFe, or SRE Foundation – SAFe preferred Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.