Senior Engineer – Integrated Command Center (ICC)

LPL Financial Global Capability Center

full-time

Required skills

AWS
communication skills
EC2
incident response
Linux
middleware
production support
Root Cause Analysis
SRE

About the role

Website: lplfinancial.in
Job details:
At LPL’s Global Capability Center, you'll find a collaborative culture where your voice matters, integrity guides every decision, and technology fuels progress. Your skills, talents, and ideas will redefine what's possible. LPL's success reflects its exceptional employees, who together pursue one noble purpose: empowering financial advisors to deliver personalized advice for all who need it. We’re proud to be expanding and reaching new heights in Hyderabad.

Join us as we create something extraordinary together.

Role Overview

LPL Financial is seeking a Senior Engineer – Integrated Command Center (ICC) to serve as a principal technical authority within Production Services. This role is responsible for enterprise‑wide incident response excellence, observability maturity, and operational resilience across LPL’s most critical platforms.

As a Senior Engineer, you will lead complex incident management, advanced diagnostics, monitoring strategy, and systemic risk reduction initiatives. You will partner deeply with SRE, Observability, Infrastructure, Automation, and Application Engineering teams to proactively prevent incidents, reduce detection ambiguity, and improve mean time to detect (MTTD) and mean time to restore (MTTR) at scale.

This is a high‑impact role focused on technical leadership, influence, and long‑term reliability outcomes, rather than day‑to‑day alert handling.

Key Responsibilities

Enterprise Incident Response Leadership

Act as a principal technical authority during high‑severity and enterprise‑impacting incidents.
Lead advanced troubleshooting and multi‑domain triage across applications, infrastructure, cloud, and network layers.
Drive technical containment and restoration strategy in partnership with domain engineering teams.
Establish incident command best practices, escalation models, and responder behaviors.
Ensure incidents are resolved with a focus on durable fixes, not temporary recovery.

Observability & Monitoring Strategy

Own and evolve the ICC monitoring and observability strategy, ensuring alignment with enterprise reliability objectives.
Define standards for actionable alerting, thresholds, routing, deduplication, and noise reduction.
Architect and oversee Dynatrace dashboards, service views, and eyes‑on‑glass experiences for mission‑critical services.
Partner with Observability and SRE teams to improve logs, metrics, traces, and service telemetry coverage.
Continuously evaluate tooling capabilities to ensure operational visibility keeps pace with platform complexity.

Advanced Platform Diagnostics

Perform deep, cross‑stack troubleshooting across AWS, Windows, Linux, networking, middleware, databases, and application services.
Validate hypotheses using evidence‑based diagnostics rather than trial‑and‑error recovery.
Serve as a go‑to expert for complex, ambiguous failure scenarios and cascading incidents.
Provide technical guidance to reduce time‑to‑diagnose during critical events.

Root Cause & Resilience Engineering

Lead or significantly influence complex root cause analysis (RCA) for major or recurring incidents.
Identify systemic weaknesses, architectural risks, and operational gaps.
Translate incident learnings into preventative engineering, monitoring, or process improvements.
Track recurring failure modes and sponsor long‑term remediation initiatives.

Operational Readiness & Risk Management

Drive operational readiness reviews for new or materially changed services.
Ensure services meet readiness standards including:

Monitoring and alerting coverage
Runbooks and escalation paths
Rollback and recovery procedures
Defined SLOs / SLIs (where applicable)

Partner with engineering teams early in the delivery lifecycle to reduce production risk.

Technical Influence & Mentorship

Mentor senior and mid‑level ICC, NOC, and production support engineers.
Raise the technical bar for troubleshooting rigor, documentation quality, and incident response discipline.
Influence reliability outcomes across teams without direct authority.
Share expertise through post‑incident reviews, technical sessions, and operational drills.

Goals & Outcomes

Deliver measurable, sustained improvements in MTTD and MTTR for priority platforms.
Reduce repeat incidents and operational toil through systemic fixes.
Improve signal‑to‑noise ratio across enterprise monitoring.
Enable responders to diagnose complex failures rapidly and confidently.
Increase overall production resilience and advisor/client experience.

Required Expertise & Experience

10-14+ years of progressive experience in Production Services, SRE, ICC/NOC, or large‑scale operations engineering roles.
Deep expertise in enterprise incident management and ITSM practices.
Advanced, hands‑on experience with Dynatrace (dashboards, alerts, service flows, diagnostics).
Strong troubleshooting capabilities across:

AWS (EC2, ALB/NLB, RDS, IAM, networking, failure modes)
Windows & Linux (performance, services, logs, resource analysis)

Proven experience operating in high‑severity, high‑stakes incident environments.
Strong documentation and communication skills at technical and executive levels.

Core Competencies

Systems thinking across complex, distributed architectures.
Ability to resolve ambiguity and make sound decisions under pressure.
Strong analytical mindset with a bias toward prevention and automation.
Exceptional collaboration and influence across engineering, leadership, and operations teams.
Relentless focus on reliability, clarity, and operational maturity.

Preferred Qualifications

Financial Services or highly regulated industry experience.
Experience working closely with SRE or Reliability Engineering teams.
Familiarity with automation in incident response or operational workflows.
Experience defining monitoring or operational standards at scale.

Why This Role Matters

Role

This Senior Engineer role is foundational to LPL’s production reliability posture. The decisions, standards, and influence of this position directly shape:

Platform stability
Incident outcomes

Advisor and investor experience

Organizational confidence in production operations

LPL Global Business Services, LLP - PRIVACY POLICY Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.