Site Reliability Engineer

TECEZE

Location: Pune District, Maharashtra, India
Job type: Full-time

Required skills

Python
AWS
Ansible
automation tools
Azure
Bash
CloudFormation
compliance
containerization
Datadog
DevOps
DNS
Docker
GCP
incident response
Kubernetes
Linux
load balancing
Root Cause Analysis
TCP
Terraform
Unix

About the role

TECEZE

Website: teceze.com
Job details:

We are seeking a highly skilled Infrastructure Reliability & Operations Engineer with strong private cloud experience and a minimum of 5 years in infrastructure reliability, operations, or site reliability engineering. The ideal candidate will be responsible for designing, implementing, and maintaining fault-tolerant infrastructure while driving automation, observability, and reliability across mission-critical systems.

You will collaborate with DevOps, development, and security teams to ensure seamless deployments, optimize performance, and uphold the highest standards of security and compliance. This role requires a proactive mindset, technical expertise, and a passion for building resilient systems.

Key Responsibilities:

• Design and maintain highly available, scalable, and secure infrastructure

• Lead incident response, root cause analysis, and post-incident reviews

• Develop automation tools and apply Infrastructure as Code (Terraform, Ansible, CloudFormation)

• Build self-healing systems and streamline operational workflows

• Support CI/CD pipelines and containerized platforms (Docker, Kubernetes, OpenShift)

• Implement monitoring, logging, and alerting systems (Prometheus, Grafana, ELK, Datadog)

• Define and track SLIs, SLOs, and SLAs for system reliability

• Collaborate with security teams on vulnerability management and compliance

Required Skills & Qualifications

• Strong experience in Linux/Unix system administration

• Proficiency in Python, Go, Bash, Shell or similar scripting languages

• Hands-on experience with AWS, Azure, or GCP

• Expertise in containerization & orchestration technologies

• Solid understanding of networking concepts (DNS, TCP/IP, load balancing, firewalls)

• Experience with monitoring, logging, and alerting tools

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.