Weekday (YC W21)
Website:
weekday.works
Job details:
This role is for one of the Weekday's clients
Salary range: Rs 2000000 - Rs 3500000 (ie INR 20-35 LPA)
Min Experience: 3 years
Location: Remote (India)
JobType: full-time
We are looking for a DevOps / Site Reliability Engineer to take full ownership of infrastructure and platform operations in a fast-scaling, AI-first environment. This role is central to building a secure, scalable, and cost-efficient cloud-native ecosystem while enabling fast and reliable deployments. You will focus on automating infrastructure, improving system reliability, and embedding intelligent, AI-driven operations into DevOps workflows. As a key contributor, you will work closely with backend and product teams to ensure seamless performance, reduce operational risks, and support rapid growth. This is a high-impact role with the opportunity to shape platform architecture, drive efficiency, and contribute to next-generation engineering practices.
Requirements
Key Responsibilities
- Design, implement, and manage scalable cloud infrastructure using AWS services such as ECS/Fargate, RDS, S3, and IAM
- Build and maintain infrastructure as code using Terraform for consistent and automated deployments
- Develop and manage CI/CD pipelines using GitHub Actions to ensure fast and reliable releases
- Implement observability and monitoring systems using tools like Prometheus, Grafana, OpenTelemetry, and Sentry
- Manage containerized environments using Docker and optimize performance under high-load conditions
- Drive cost optimization initiatives across cloud infrastructure
- Integrate AI-driven solutions into DevOps workflows, such as automated log analysis and predictive scaling
- Collaborate with engineering teams to improve system performance, scalability, and reliability
- Ensure infrastructure security, compliance readiness, and best practices
- Continuously improve deployment pipelines, reduce downtime, and enhance system resilience
What Makes You a Great Fit
- 2-3+ years of experience in DevOps, SRE, or backend infrastructure roles
- Strong hands-on experience with AWS infrastructure and cloud-native architectures
- Expertise in Terraform and infrastructure as code practices
- Proven experience with CI/CD pipelines and containerization tools like Docker
- Strong understanding of observability, monitoring, and incident management
- Experience troubleshooting and optimizing systems under production load
- Exposure or strong interest in integrating AI/LLMs into DevOps workflows
- Knowledge of security, compliance standards (SOC 2, GDPR), and infrastructure best practices
- Familiarity with multi-cloud environments (GCP, Azure) is a plus
- Strong ownership mindset, problem-solving ability, and clear communication skills
Click on Apply to know more.