Site Reliability Engineer
Clearwater Analytics
- Location
- Noida, Uttar Pradesh, India
- Job type
- Full-time
Required skills
- Python
- AWS
- Ansible
- Bash
- CloudWatch
- Datadog
- DevOps
- Docker
- EC2
- fintech
- GitHub
- infrastructure-as-code
- Jenkins
- Kubernetes
- Linux
- microservices
- Terraform
- VPC
About the role
Website:
cwan.com
Job details:
Key Responsibilities
- Build and maintain observability stacks using Prometheus and Grafana; define SLOs, SLIs, SLAs and error budgets.
- Own incident response: on-call rotation, triage, mitigation, and blameless post-mortems.
- Automate repetitive operational tasks and eliminate toil through scripting and tooling (Python, Bash, Go).
- Design, deploy, and maintain highly available infrastructure on AWS using Terraform and Ansible for infrastructure-as-code workflows.
- Manage and optimize Kubernetes clusters (EKS) and containerized workloads with Docker to support microservices architecture.
- Collaborate with engineering teams during design reviews to embed reliability and scalability requirements.
- Monitor capacity and performance trends; proactively identify and resolve bottlenecks.
- Maintain and improve CI/CD pipelines and deployment automation.
Qualifications Required
- 2–8 years of experience in Site Reliability Engineering, DevOps, or a closely related discipline.
- Working knowledge of monitoring and logging tools like Prometheus, Grafana, Dynatrace or Datadog, OpenSearch and Victoria metrics etc.
- Tracking and monitoring SLAs for all critical services.
- Experience with Linux systems administration.
- Hands-on experience with Kubernetes and Docker in production environments.
- Proficiency with AWS services (EC2, EKS, RDS, S3, VPC, IAM, CloudWatch).
- Experience with Infrastructure-as-Code tools such as Terraform or Ansible.
- Strong scripting skills in Python or Bash.
- Familiarity with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI).
- Familiarity with GitOps workflows (ArgoCD, Rancher etc).
Preferred
- Experience in financial services, FinTech, or other regulated industries.
- Knowledge of service mesh technologies (Istio, Linkerd).
- Familiarity with distributed tracing tools (Jaeger, OpenTelemetry).
- AWS certifications (Solutions Architect, DevOps Engineer, or equivalent).
- Experience with cost optimization strategies in cloud environments.
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.