TIGI HR
Website:
tigihr.com
Job details:
About the Role
We are looking for an AWS DevOps Engineer with experience in AI/ML to design, implement, and manage cloud infrastructure and deployment pipelines for AI-driven applications. The ideal candidate will have strong AWS expertise, CI/CD experience, and hands-on knowledge of deploying machine learning models in production environments. The AWS DevOps Engineer will be responsible for designing, building, and maintaining scalable infrastructure on AWS, implementing CI/CD pipelines, and deploying, monitoring, and optimizing AI/ML models in production.
Responsibilities
- Design, build, and maintain scalable infrastructure on AWS.
- Implement CI/CD pipelines for application and ML model deployments.
- Deploy, monitor, and optimize AI/ML models in production.
- Automate infrastructure using Infrastructure as Code (Terraform/CloudFormation).
- Manage containerized applications using Docker and Kubernetes (EKS).
- Monitor system performance using CloudWatch and other monitoring tools.
- Ensure security best practices and cost optimization in AWS.
- Collaborate with data scientists and development teams.
- Architect and manage distributed, stateful systems in production environments.
- Lead infrastructure migrations and zero-downtime cutovers across environments.
- Monitor system performance, reliability, and SLAs using observability stacks (other than CloudWatch).
Qualifications
Experience : 7-10 years
Required Skills
- Strong experience with AWS services (EC2, S3, RDS, Lambda, IAM, VPC).
- Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, etc.).
- Hands-on experience with Docker and Kubernetes.
- Experience with Infrastructure as Code (Terraform or CloudFormation).
- Knowledge of Python or Bash scripting.
- Built agentic AI systems using frameworks like langgraph.
- Experience operating distributed systems and stateful workloads at scale.
- Kubernetes experience including StatefulSets, autoscaling, rolling upgrades.
- Experience implementing blue/green deployments and rollback strategies.
- Strong understanding of networking fundamentals (DNS, load balancing, TLS, VPC peering).
Preferred Skills
- Experience with MLOps practices.
- Familiarity with monitoring tools (Prometheus, Grafana).
- Understanding of security and DevSecOps practices.
- Experience tuning performance for latency-sensitive systems.
- Experience with multi-region deployments and disaster recovery planning.
- Good to have (Optional): Experience managing high-volume data ingestion pipelines.
- Experience working with search engines, vector databases, or large-scale data retrieval systems.
- Experience handling data reindexing, cluster scaling, and system migrations.
Click on Apply to know more.