Senior Site Reliability Engineer - DevOps/ NOCs

VinSmart Future

Location: Bengaluru South, Karnataka, India
Job type: Full-time

Required skills

Python
Agile
AWS
Ansible
Bash
CloudWatch
communication skills
Datadog
Elasticsearch
Flink
GCP
incident response
Jenkins
Jira
Kafka
platform services
Postgres
Redis
SRE
Terraform
uptime

About the role

Website:
Job details:
🚀 We’re Hiring: Senior Site Reliability Engineer (SRE) --Devops/ NOCs

📍 Location: Bengaluru India ( Onsite)

🕒 Full-time | Engineering Team

We’re looking for a Senior Site Reliability Engineer to join our Engineering team and help scale and stabilize our platform at a global level.

If you are passionate about system reliability, automation, and large-scale distributed systems, this role is for you.

🌐 About the Team

You’ll report directly to the Head of Infrastructure & Deployment and be part of our SRE team.

Our mission:

👉 Ensure high availability, scalability, and performance across all critical systems

👉 Bridge Development & Operations through automation

👉 Deliver a seamless experience for users and customers

🎯 What You’ll Do

Design, build, and operate Kubernetes-based platform services & operators
Own and optimize core infrastructure (multi-cluster environments)
Define and enforce SLOs, SLIs, and error budgets
Build monitoring, logging, and alerting systems to ensure uptime
Automate deployments and reduce manual operations (toil reduction)
Lead incident response, on-call, and post-mortems
Collaborate with Engineering, Security, and QA teams early in the dev lifecycle
Conduct performance testing, load testing, and chaos engineering

🧩 What We’re Looking For✅ Must-have

Strong background in software engineering & distributed systems
Hands-on experience with Kubernetes (operator development)
Experience with cloud platforms (AWS / GCP)
Proficiency in at least one language: Go, Python, Bash
Experience with observability tools:
(Prometheus, Grafana, Datadog, ELK, CloudWatch, Jaeger, OpenTelemetry, etc.)
Experience with IaC (Terraform/Ansible) & CI/CD (Jenkins, ArgoCD)
Solid understanding of networking & security fundamentals
Familiarity with databases: Postgres, MongoDB, Redis, Elasticsearch
Experience with streaming/data systems: Kafka / Flink

➕ Nice to have

Experience with chaos engineering
Experience in disaster recovery planning & testing
Familiar with Agile tools (Jira, etc.)

💡 Who You Are

Proactive and driven by reliability & system excellence
Strong problem-solver with analytical mindset
Excellent collaboration & communication skills
Comfortable leading during high-pressure incidents
Data-driven and focused on continuous improvement
Strong ownership mindset — you build it, you run it

⏰ Working Time

Monday to Friday (standard working hours)
2 Saturday mornings/month: on-site at the office
2 Saturday afternoons/month: remote working

🌟 Why You’ll Love It Here

Work on large-scale, high-impact systems
Culture of ownership, trust, and transparency
Opportunity to shape platform reliability at scale
Collaborate with a high-caliber global engineering team

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.