Website:
clevanoo.com
Job details:
Job Title: DevOps Engineer / Site Reliability Engineer
Duration: 6 months+
Location: Hybrid, 2 days from office, based in Pune or Hyderabad.
5-8 years of experience.
Role Overview
Senior DevOps / Site Reliability Engineer (SRE) to join our platform engineering team responsible for building and operating reliable, scalable cloud-native platforms.
The role focuses on Kubernetes platform operations, automation, observability, and improving system reliability using DevOps and SRE practices, while leveraging AI-assisted tooling and automation to improve observability, incident response, and operational efficiency.
Work closely with engineering teams to ensure systems are highly available, observable, and resilient, while contributing to platform architecture, automation, and modern reliability engineering practices.
Key Responsibilities
• Operate and improve reliability of cloud-native platforms running on Kubernetes.
• Build and maintain CI/CD pipelines and GitOps-based deployment workflows(FluxCD,ArgoCD).
• Design and manage Infrastructure as Code (IaC) using tools such as Terraform to provision and manage cloud infrastructure.
• Support and optimize cloud architecture, networking, and security configurations in cloud environments.
• Implement and manage observability solutions (metrics, logs, traces, alerting) using platforms such as Datadog, OpenTelemetry.
• Define and monitor SLIs, SLOs, and service reliability metrics.
• Participate in incident management, troubleshooting, and root cause analysis (RCA) for production systems.
• Design and implement high availability, fault tolerance, and resilience strategies for distributed systems.
• Develop automation and internal tooling to reduce operational toil and improve engineering productivity.
• Contribute to design and architecture of new platform capabilities, including writing solution intents, technical design documents, and CDRs (Critical Design Reviews) for architecture reviews and engineering governance.
• Apply an AI-first mindset by exploring AI-assisted DevOps / AIOps capabilities for monitoring, troubleshooting, and operational automation.
• Collaborate with engineering teams to improve platform reliability, performance, and operational maturity.
Required Skills
• 5–8 years of experience in DevOps, Platform Engineering, or Site Reliability Engineering roles
• Strong hands-on experience with Kubernetes and containerized workloads
• Solid understanding of cloud platforms (Azure preferred, AWS/GCP acceptable)
• Strong knowledge of cloud networking, security, and distributed systems concepts
• Experience with CI/CD pipelines and GitOps practices
• Experience with Infrastructure as Code tools (Terraform or similar)
• Strong understanding of observability principles (metrics, logs, tracing, alerting)
• Hands-on experience with Datadog or similar observability platforms
• Scripting or programming skills in Python / Go
• Experience or strong interest in AI-native DevOps practices and building AI-assisted operational tooling using LLMs or emerging frameworks (e.g., MCP)
Preferred Skills
• Experience with microservices architectures and distributed systems
• Exposure to service mesh technologies (Istio or similar)
• Experience implementing high availability and resilience patterns in cloud platforms
• Exposure to AI-driven observability, automation, or AIOps tools
• Experience contributing to solution design, architecture discussions, and preparing technical design documentation for architecture reviews.
Thanks & Regards,
Imran khan,
+91 8247747186.
Click on Apply to know more.