Website:
Job details:
Job Title: Incident Manager – Fault Management
Function: Incident, Problem & Change Management
Department: NOC Operations / Command Centre / Service Operations
Experience: 6–10 Years
Employment Type: Full-Time
Shift: 24x7 Rotational (as per business requirement)
Location: Gurgaon / Mumbai / Pune (or as per project needs)
Job Summary
We are looking for an experienced
Incident Manager to lead
fault management operations, ensuring rapid restoration of services and minimal business impact. The role focuses on
Incident, Problem, and Change Management, acting as a central point of coordination during major incidents and ensuring compliance with ITIL processes, SLAs, and governance standards.
The ideal candidate will have strong
operational leadership, stakeholder communication, and escalation management skills within complex IT / Telecom environments.
Key Responsibilities Incident Management
- Own and manage P1/P2/P3 incidents end-to-end in line with ITIL standards.
- Act as Incident Commander during major incidents, leading bridge calls and coordinating technical teams.
- Ensure timely incident detection, logging, categorization, prioritization, and resolution.
- Drive restoration efforts and ensure adherence to SLAs, OLAs, and KPIs.
- Provide regular incident status updates to customers, management, and stakeholders.
- Ensure proper incident documentation, closure notes, and audit readiness.
Fault Management & Monitoring
- Oversee proactive fault detection through NOC monitoring tools.
- Ensure alarms and alerts are correlated, triaged, and assigned appropriately.
- Coordinate with L2/L3 engineering teams for fault isolation and resolution.
- Identify recurring faults and initiate preventive actions.
Problem Management
- Lead Root Cause Analysis (RCA) for recurring and major incidents.
- Facilitate Post-Incident Reviews (PIRs) and track corrective and preventive actions (CAPA).
- Maintain problem records and trend analysis to reduce repeat incidents.
- Work closely with engineering and vendors to drive permanent fixes.
Change Management
- Govern changes to production environments to minimize risk.
- Review and validate Change Requests (CRs), MOPs, rollback plans, and impact assessments.
- Participate in CAB (Change Advisory Board) meetings.
- Ensure changes are executed as per approved windows with pre/post validation.
- Track change-related incidents and drive improvement actions.
Stakeholder & Vendor Coordination
- Act as a single point of contact during service-impacting events.
- Coordinate with internal teams, service providers, OEMs, and vendors.
- Manage customer communication during outages and critical events.
- Escalate issues appropriately to senior management when required.
Governance, Reporting & Continuous Improvement
- Prepare and publish incident, problem, and change management reports (daily/weekly/monthly).
- Monitor and improve operational KPIs and SLA performance.
- Drive process improvements aligned with ITIL best practices.
- Maintain SOPs, runbooks, escalation matrices, and communication templates.
- Support audits, compliance reviews, and regulatory requirements (if applicable).
Required Skills & Competencies Technical & Process Skills - Strong expertise in ITIL Incident, Problem, and Change Management.
- Experience working in NOC / Command Centre / Telecom / Enterprise IT Operations.
- Good understanding of infrastructure domains:
- Network (LAN/WAN/SD-WAN)
- Security (Firewalls, SOC coordination)
- Data Center / Cloud (basic understanding)
- Familiarity with monitoring tools (SolarWinds, Netcool, Splunk, PRTG, etc.).
- Hands-on experience with ITSM tools such as ServiceNow, Remedy, Helix, Jira.
Soft Skills
- Strong leadership and decision-making abilities during high-pressure situations.
- Excellent verbal and written communication skills.
- Strong stakeholder and customer management capability.
- Analytical mindset with attention to detail.
- Ability to work independently and in cross-functional teams.
Education & Certifications
- Bachelor’s degree in Engineering, IT, Computer Science, or related field.
- ITIL Foundation (mandatory); ITIL Intermediate/Expert is a plus.
- PMP / PRINCE2 / Agile certifications are advantageous.
Experience
- 6–10 years of experience in Incident / Problem / Change Management roles.
- Prior experience handling Major Incidents in 24x7 operations environments.
- Experience in Telecom, BFSI, Managed Services, or Large Enterprise IT preferred.
Key Performance Indicators (KPIs)
- Incident response and resolution times.
- SLA and availability compliance.
- Reduction in repeat incidents.
- Quality and timeliness of RCA reports.
- Change success rate and reduction in change-related incidents.
Skills: change management,rca,it,incident handling,fault management,teams,operations,itil
Click on Apply to know more.