UST
Website:
ust.com
Job details:
Role Description
Site Reliability Engineer (SRE) / DevOps Engineer
Experience: 5–7 Years
Location: Bangalore / Pune / Chennai (Flexible/Hybrid)
Key Skills
SRE, DevOps, Cloud (AWS/GCP/Azure), Terraform, Ansible, Docker, Kubernetes, CI/CD, Jenkins, Python, Linux, Monitoring, Incident Management
Job Description
We are seeking a highly motivated Site Reliability Engineer (SRE) / DevOps Engineer to manage and enhance the reliability, scalability, and performance of cloud-based systems. The ideal candidate will have strong experience in cloud operations, automation, and infrastructure management across modern distributed environments.
Key Responsibilities
- Manage and maintain highly available, scalable systems across cloud and hybrid environments
- Build and automate infrastructure using Infrastructure as Code (Terraform, Ansible, Chef)
- Develop and maintain CI/CD pipelines using Jenkins and cloud-native deployment tools
- Deploy and manage containerised applications using Docker and Kubernetes
- Monitor system health, define SLIs/SLOs, and enhance reliability through proactive improvements
- Handle incident management, production support, and root cause analysis (RCA)
- Drive system performance optimisation, reliability improvements, and reduce MTTR
- Automate operational tasks using scripting languages such as Python, Bash, or similar
- Collaborate with development, QA, and architecture teams for seamless release and deployment
- Implement best practices in DevOps, automation, and operational excellence
Required Skills & Experience
- 5–7 years of experience in SRE, DevOps, Cloud Operations, or System Administration
- Hands-on experience with cloud platforms (AWS, GCP, or Azure) and hybrid environments
- Strong expertise in Infrastructure as Code (Terraform, Ansible, Chef)
- Experience building and managing CI/CD pipelines (Jenkins, Git-based tools)
- Proficiency in at least one scripting/programming language (Python, Bash, Java, Go, or JavaScript)
- Strong knowledge of Linux/Windows systems, networking, storage, and security
- Experience with Docker, Kubernetes, and distributed systems troubleshooting
- Hands-on experience in monitoring, ing, and incident response processes
- Strong understanding of system reliability, availability, and performance engineering
- Experience working in Agile/Scrum environments
Nice to Have
- Cloud certifications (AWS/GCP/Azure)
- Exposure to DevSecOps practices and security tooling
- Experience with monitoring tools like Prometheus, Grafana, Dynatrace, or similar
- Knowledge of automation frameworks and operational best practices
Skills
devops,cicd,gcp devops,terraform,
Click on Apply to know more.