Website:
lplfinancial.in
Job details:
At LPL’s Global Capability Center, you'll find a collaborative culture where your voice matters, integrity guides every decision, and technology fuels progress. Your skills, talents, and ideas will redefine what's possible. LPL's success reflects its exceptional employees, who together pursue one noble purpose: empowering financial advisors to deliver personalized advice for all who need it. We’re proud to be expanding and reaching new heights in Hyderabad.
Join us as we create something extraordinary together.
Role Overview
LPL Financial is seeking a
Senior Engineer – Integrated Command Center (ICC) to serve as a
principal technical authority within Production Services. This role is responsible for
enterprise‑wide incident response excellence, observability maturity, and operational resilience across LPL’s most critical platforms.
As a Senior Engineer, you will lead
complex incident management, advanced diagnostics, monitoring strategy, and systemic risk reduction initiatives. You will partner deeply with SRE, Observability, Infrastructure, Automation, and Application Engineering teams to proactively prevent incidents, reduce detection ambiguity, and improve mean time to detect (MTTD) and mean time to restore (MTTR) at scale.
This is a high‑impact role focused on
technical leadership, influence, and long‑term reliability outcomes, rather than day‑to‑day alert handling.
Key Responsibilities
Enterprise Incident Response Leadership
- Act as a principal technical authority during high‑severity and enterprise‑impacting incidents.
- Lead advanced troubleshooting and multi‑domain triage across applications, infrastructure, cloud, and network layers.
- Drive technical containment and restoration strategy in partnership with domain engineering teams.
- Establish incident command best practices, escalation models, and responder behaviors.
- Ensure incidents are resolved with a focus on durable fixes, not temporary recovery.
Observability & Monitoring Strategy
- Own and evolve the ICC monitoring and observability strategy, ensuring alignment with enterprise reliability objectives.
- Define standards for actionable alerting, thresholds, routing, deduplication, and noise reduction.
- Architect and oversee Dynatrace dashboards, service views, and eyes‑on‑glass experiences for mission‑critical services.
- Partner with Observability and SRE teams to improve logs, metrics, traces, and service telemetry coverage.
- Continuously evaluate tooling capabilities to ensure operational visibility keeps pace with platform complexity.
Advanced Platform Diagnostics
- Perform deep, cross‑stack troubleshooting across AWS, Windows, Linux, networking, middleware, databases, and application services.
- Validate hypotheses using evidence‑based diagnostics rather than trial‑and‑error recovery.
- Serve as a go‑to expert for complex, ambiguous failure scenarios and cascading incidents.
- Provide technical guidance to reduce time‑to‑diagnose during critical events.
Root Cause & Resilience Engineering
- Lead or significantly influence complex root cause analysis (RCA) for major or recurring incidents.
- Identify systemic weaknesses, architectural risks, and operational gaps.
- Translate incident learnings into preventative engineering, monitoring, or process improvements.
- Track recurring failure modes and sponsor long‑term remediation initiatives.
Operational Readiness & Risk Management
- Drive operational readiness reviews for new or materially changed services.
- Ensure services meet readiness standards including:
- Monitoring and alerting coverage
- Runbooks and escalation paths
- Rollback and recovery procedures
- Defined SLOs / SLIs (where applicable)
- Partner with engineering teams early in the delivery lifecycle to reduce production risk.
Technical Influence & Mentorship
- Mentor senior and mid‑level ICC, NOC, and production support engineers.
- Raise the technical bar for troubleshooting rigor, documentation quality, and incident response discipline.
- Influence reliability outcomes across teams without direct authority.
- Share expertise through post‑incident reviews, technical sessions, and operational drills.
Goals & Outcomes
- Deliver measurable, sustained improvements in MTTD and MTTR for priority platforms.
- Reduce repeat incidents and operational toil through systemic fixes.
- Improve signal‑to‑noise ratio across enterprise monitoring.
- Enable responders to diagnose complex failures rapidly and confidently.
- Increase overall production resilience and advisor/client experience.
Required Expertise & Experience
- 10-14+ years of progressive experience in Production Services, SRE, ICC/NOC, or large‑scale operations engineering roles.
- Deep expertise in enterprise incident management and ITSM practices.
- Advanced, hands‑on experience with Dynatrace (dashboards, alerts, service flows, diagnostics).
- Strong troubleshooting capabilities across:
- AWS (EC2, ALB/NLB, RDS, IAM, networking, failure modes)
- Windows & Linux (performance, services, logs, resource analysis)
- Proven experience operating in high‑severity, high‑stakes incident environments.
- Strong documentation and communication skills at technical and executive levels.
Core Competencies
- Systems thinking across complex, distributed architectures.
- Ability to resolve ambiguity and make sound decisions under pressure.
- Strong analytical mindset with a bias toward prevention and automation.
- Exceptional collaboration and influence across engineering, leadership, and operations teams.
- Relentless focus on reliability, clarity, and operational maturity.
Preferred Qualifications
- Financial Services or highly regulated industry experience.
- Experience working closely with SRE or Reliability Engineering teams.
- Familiarity with automation in incident response or operational workflows.
- Experience defining monitoring or operational standards at scale.
Why This Role Matters
Role
This Senior Engineer role is
foundational to LPL’s production reliability posture. The decisions, standards, and influence of this position directly shape:
- Platform stability
- Incident outcomes
Advisor and investor experience
- Organizational confidence in production operations
LPL Global Business Services, LLP - PRIVACY POLICY
Click on Apply to know more.