Viraaj HR Solutions Private Limited
Website:
viraajhrsolutions.com
Job details:
About The Opportunity
A fast-paced Cloud Infrastructure & Managed Services organization operating in enterprise cloud operations and platform reliability. We deliver 24/7 production support, incident management, and automation for large-scale AWS environments serving customers across India. This on-site role (India) is focused on stabilizing production systems, reducing operational toil, and driving reliability improvements.
Primary title (standardized): AWS Production Support Engineer
Role & Responsibilities
- Monitor, triage, and resolve production incidents in AWS; own incident lifecycle from detection to closure and perform timely escalations to engineering teams.
- Troubleshoot infrastructure and application issues across EC2, RDS, VPC, ELB, S3, and Lambda; coordinate cross-functional remediation actions.
- Maintain and improve infrastructure-as-code templates and pipelines using CloudFormation and Terraform to ensure safe, repeatable deployments.
- Author and update runbooks, automation scripts (Bash/Python), and operational playbooks to reduce manual intervention.
- Operate and tune observability and alerting stacks (CloudWatch, Prometheus, Grafana, ELK); create dashboards and actionable alerts to minimise noise and improve MTTR.
- Participate in on-call rotations, conduct RCA sessions, document findings, and drive corrective actions to improve system reliability and capacity planning.
Skills & Qualifications
Must-Have
- AWS EC2
- AWS S3
- AWS CloudWatch
- AWS IAM
- AWS Lambda
- AWS RDS
- Terraform
- CloudFormation
Preferred
- Docker
- Kubernetes
- Prometheus
Additional Qualifications
- Proven experience in production support or site-reliability/DevOps roles for AWS-hosted applications (on-site availability in India required).
- Strong Linux administration and scripting ability (Bash or Python) for automation and troubleshooting.
- Experience with CI/CD tooling (e.g., Jenkins/GitLab) and familiarity with logging/observability best practices.
Benefits & Culture Highlights
- Hands-on ownership of critical production systems with clear impact on customer experience.
- Learning-focused environment with opportunities to upskill in IaC, monitoring, and reliability engineering.
- Competitive compensation, on-site collaboration, and a structured on-call rota.
Skills: terraform,prometheus,docker,aws production analyst,kubernetes,aws lambda
Click on Apply to know more.