Strategy
• Act as L3 escalation point for critical production incidents (P4 and above).
• Perform deep root cause analysis of complex application and integration issues.
• Deliver permanent defect fixes, including code-level changes where required.
• Coordinate with Development, Infra, DB, and Middleware teams for end-to-end resolution.
• Support release validation and ensure production stability post-deployment.
Business
• Ensure high availability of Banking applications and systems for BAU.
• Minimize customer impact and revenue loss due to production incidents.
• Provide executive-level reporting on production health, risk exposure, and system performance.
• Collaborate with Business, Product, and Technology teams to align reliability goals with growth initiatives.
• Support critical business cycles such as EOD/BOD, month-end, and regulatory reporting timelines.
Processes
• Own end-to-end Incident, Problem, and Change Management processes.
• Lead Major Incident Management and crisis war rooms for critical issues.
• Ensure effective Root Cause Analysis (RCA) with measurable preventive actions.
• Continuously improve MTTR, change success rate, and incident reduction metrics.
People & Talent
• Build and mentor high-performing SRE and Production Support teams.
• Define competency frameworks and skill development roadmaps.
• Promote a culture of ownership, accountability, and continuous improvement.
• Lead succession planning and talent retention strategies.
• Encourage cross-skilling in cloud, automation, and cybersecurity domains.