Flag job

Report

Senior Site Reliability Engineer - Incident Management/Resiliency (Hybrid)

Salary

$85k - $125k

Min Experience

3 years

Location

Chicago, IL

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Resilience Engineering is a subset of the Site Reliability Engineering team that strives to foster a culture of continuous improvement through incident analysis, process evolution, and problem-solving. We work closely with teams across Tech, Product, and Operations through our Production Incident process to uncover system weaknesses, learn from failures, and make our technology more reliable. In this role, you'll play a key role in enhancing the resiliency of our systems. Your work will focus on our incident response, reporting and analysis processes, enabling the organization to better prepare for and respond to complex system failures. You'll drive efforts to optimize how we manage unexpected outages, from leading real-time incident response to facilitating post-incident reviews. You'll identify patterns across incidents, uncover contributing factors, and work across teams to recommend long-term solutions that improve our systems' resilience. Your core priorities will be to: Lead production incidents as part of our PI PIC (or Incident Commander) rotation after completing training, ensuring clear communication and resolution. Capture and maintain detailed documentation of incidents, contributing factors, and learnings in formal incident reports. Facilitate and document blameless post-incident reviews that promote learning and continuous improvement. Collect and analyze incident data to identify systemic issues, risks, and trends. Collaborate with engineering, product, and operations teams to address vulnerabilities and build more resilient systems. Drive improvements to how we collect, analyze, and learn from system failures. Design and run failure simulations (e.g., mock incidents, disaster recovery exercises) to proactively identify weak points. Champion a culture of operational excellence and resilience across the organization. Continuously evolve our incident management processes to ensure they scale with our technology and business needs.

About the company

Enova International is a leading financial technology company that provides online financial services through our AI and machine learning-powered Colossus™platform. We serve non-prime consumers and businesses alike, while offering world-class technology and services to traditional banks—in order to create accessible credit for millions. Being a values-driven organization is at the core of Enova's success. We live our values by listening to our customers, challenging assumptions, thinking big, setting high expectations, and hiring and developing the best. Through our values and our commitment to making Enova an awesome place to work, we maintain an environment of inclusion and culture where our employees can thrive. You can learn more about Enova's values and culture here.

Skills

sql
postgresql
kafka