Website:
Job details:
Educational Qualification
B.Tech/M.Tech/M.S. in Computer Science, Data Engineering, AI or related discipline. Certification in cloud DevOps or MLOps platforms (AWS DevOps Engineer, Azure DevOps Expert, GCP Professional ML Engineer) is highly desirable.
Contributions to MLOps or DevOps open source projects is preferred.
Experience
- 7-10 years in machine learning operations or DevOps engineering.
Minimum 4 years building CI/CD pipelines for AI/ML model deployment in enterprise or government ecosystems.
Proven experience with containerized and microservice architectures.
Key Responsibilities
- Design and manage continuous integration and delivery (CI/CD) pipelines for AI/ML
models across multiple environments.
- Establish model versioning, deployment, monitoring, and rollback mechanisms to ensure
stability and traceability.
- Automate training, testing, and serving workflows using containerized solutions.
- Define infrastructure-as-code templates for scalable AI deployment on on-prem or cloud
environments.
- Collaborate with Data Science and Engineering teams to standardize model input/output
formats and performance metrics.
- Implement logging, monitoring, and alerting for deployed models to ensure high
availability and accuracy over time.
- Ensure compliance with Responsible Al guidelines for deployment, including bias auditing
and explainability tracking.
Technical Competencies
MLOps Platforms: MLflow, Kubeflow, Azure ML, AWS SageMaker Pipelines, GCP Vertex AI Pipelines for end-to-end ML workflow orchestration
Containerization: Docker, Kubernetes, Helm charts, container registries, and microservices architecture for ML workloads
- CI/CD: Jenkins, GitLab CI, GitHub Actions, Azure DevOps with specialized ML pipeline integration and automated testing
Infrastructure-as-Code: Terraform, CloudFormation, Ansible for reproducible ML infrastructure provisioning and management
⚫ Cloud Platforms: AWS (EKS, Lambda, ECR, S3), Azure (AKS, Container Registry, Blob
Storage), GCP (GKE, Cloud Build, Cloud Storage)
Model Serving: TorchServe, TensorFlow Serving, Seldon, KServe, REST APIs, and real- time inference infrastructure.
Programming Languages: Python for automation, Bash scripting, YAML for configuration management, basic understanding of Go/Java
Database & Storage: Feature stores (Feast, Tecton), model registries, data versioning (DVC), and distributed storage systems
Workflow Orchestration: Apache Airflow, Prefect, Argo Workflows for complex ML pipeline scheduling and dependency management
Skills: cd,aws,devops,ci,infrastructure,ml,cloud
Click on Apply to know more.