Site Reliability Engineer

Bean HR Consulting

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Python
AWS
Ansible
Azure
Bash
cloud infrastructure
containerization
Docker
Helm
incident response
Kubernetes
Root Cause Analysis
SRE
Terraform
PowerShell

About the role

Bean HR Consulting

Website: beanhr.com
Job details:

Job Title: Site Reliability Engineer (SRE)

Job Overview

We are looking for a skilled Site Reliability Engineer (SRE) to join our team and help build, scale, and maintain highly reliable and resilient cloud-based systems. The ideal candidate will have a strong foundation in cloud infrastructure, automation, observability, and incident management, with a focus on improving system reliability and performance.

Key Responsibilities

Design, build, and maintain highly available and scalable cloud infrastructure (primarily on Azure).
Implement and manage Infrastructure as Code (IaC) using tools like Terraform, Helm, or Ansible.
Develop and optimize CI/CD pipelines with integrated security and quality checks.
Deploy, manage, and orchestrate containerized applications using Kubernetes and Docker.
Establish and enhance observability practices including monitoring, logging, tracing, and alerting.
Collaborate with development teams to define SLIs, SLOs, and implement effective alerting strategies.
Participate in on-call rotations, respond to production incidents, and perform root cause analysis (RCA).
Continuously improve system reliability, availability, and performance through automation and best practices.
Drive a culture of reliability, scalability, and operational excellence.

Required Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
3–8 years of experience as a Site Reliability Engineer or in a similar role.
Hands-on experience with cloud platforms such as Azure (preferred) or AWS.
Strong experience with Infrastructure as Code (Terraform preferred).
Proficiency in scripting languages such as Python, Bash, or PowerShell.
Experience with CI/CD tools and automation pipelines.
Solid understanding of containerization and orchestration (Docker, Kubernetes).
Experience in incident management, on-call support, and root cause analysis.

Preferred Qualifications

Experience with observability tools such as Grafana, Prometheus, ELK Stack.
Familiarity with on-call and incident management tools like PagerDuty or Zenduty.
Experience defining and managing SLIs, SLOs, and SLAs.
Knowledge of security best practices in CI/CD and cloud environments.

Key Skills

System Reliability & High Availability
Observability & Monitoring
Incident Response & Troubleshooting
Automation & Infrastructure as Code
Cloud & Container Technologies
Collaboration & Communication

Work Model

Hybrid (Onsite + Remote)

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.