SRE Lead

FundsIndia

Location: Chennai, Tamil Nadu, India
Job type: Full-time

Required skills

Python
AWS
Bash
incident response
Kubernetes
Linux
production support
Root Cause Analysis
SRE

About the role

FundsIndia

Website: fundsindia.com
Job details:

Role: SRE Lead

Experience: 6 – 7 years

Location - Chennai

This role is critical for strengthening our incident management, observability, and overall reliability practices. The candidate will be responsible for leading incident response, improving monitoring systems, and driving SRE best practices across production environments.

Job Description – SRE Lead

Role Overview

We are looking for a Lead Site Reliability Engineer with 6-7 years of experience to drive reliability, observability, and incident management practices. The ideal candidate will have strong expertise in Grafana stack, production monitoring, and handling critical incidents in high-availability systems.

Key Responsibilities

Act as the Incident Commander during production outages, ensuring timely resolution and stakeholder communication
Lead incident response, triage, RCA (Root Cause Analysis), and postmortems
Build and enhance observability systems using Grafana (Prometheus, Loki, Tempo)
Define and manage SLIs, SLOs, and SLAs for critical services.
Develop and maintain monitoring, alerting, and dashboards for proactive issue detection.
Collaborate with Dev, Infra, and DB teams to improve system reliability and performance.
Drive automation and runbook creation to reduce manual intervention.
Improve on-call processes and incident management workflows
Ensure high availability, scalability, and fault tolerance of systems

Required Skills

5–6 years of experience in Site Reliability Engineering / Production Support
Strong hands-on experience with Grafana stack (Prometheus, Loki, Tempo)
Solid understanding of monitoring, alerting, and observability principles
Experience in incident management and handling P1/P2 incidents
Knowledge of cloud platforms (AWS)
Experience with Linux systems and troubleshooting
Familiarity with Kubernetes / containerized environments
Strong scripting skills (Python / Bash)

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.