Website:
Job details:
🚀 We’re Hiring: Senior Site Reliability Engineer (SRE) --Devops/ NOCs
📍 Location: Bengaluru India ( Onsite)
🕒 Full-time | Engineering Team
We’re looking for a Senior Site Reliability Engineer to join our Engineering team and help scale and stabilize our platform at a global level.
If you are passionate about system reliability, automation, and large-scale distributed systems, this role is for you.
🌐 About the Team
You’ll report directly to the Head of Infrastructure & Deployment and be part of our SRE team.
Our mission:
👉 Ensure high availability, scalability, and performance across all critical systems
👉 Bridge Development & Operations through automation
👉 Deliver a seamless experience for users and customers
🎯 What You’ll Do
- Design, build, and operate Kubernetes-based platform services & operators
- Own and optimize core infrastructure (multi-cluster environments)
- Define and enforce SLOs, SLIs, and error budgets
- Build monitoring, logging, and alerting systems to ensure uptime
- Automate deployments and reduce manual operations (toil reduction)
- Lead incident response, on-call, and post-mortems
- Collaborate with Engineering, Security, and QA teams early in the dev lifecycle
- Conduct performance testing, load testing, and chaos engineering
🧩 What We’re Looking For✅ Must-have
- Strong background in software engineering & distributed systems
- Hands-on experience with Kubernetes (operator development)
- Experience with cloud platforms (AWS / GCP)
- Proficiency in at least one language: Go, Python, Bash
- Experience with observability tools:
- (Prometheus, Grafana, Datadog, ELK, CloudWatch, Jaeger, OpenTelemetry, etc.)
- Experience with IaC (Terraform/Ansible) & CI/CD (Jenkins, ArgoCD)
- Solid understanding of networking & security fundamentals
- Familiarity with databases: Postgres, MongoDB, Redis, Elasticsearch
- Experience with streaming/data systems: Kafka / Flink
➕ Nice to have
- Experience with chaos engineering
- Experience in disaster recovery planning & testing
- Familiar with Agile tools (Jira, etc.)
💡 Who You Are
- Proactive and driven by reliability & system excellence
- Strong problem-solver with analytical mindset
- Excellent collaboration & communication skills
- Comfortable leading during high-pressure incidents
- Data-driven and focused on continuous improvement
- Strong ownership mindset — you build it, you run it
⏰ Working Time
- Monday to Friday (standard working hours)
- 2 Saturday mornings/month: on-site at the office
- 2 Saturday afternoons/month: remote working
🌟 Why You’ll Love It Here
- Work on large-scale, high-impact systems
- Culture of ownership, trust, and transparency
- Opportunity to shape platform reliability at scale
- Collaborate with a high-caliber global engineering team
Click on Apply to know more.