Codvo.ai
Website:
codvo.ai
Job details:
Job Summary:
As an MLOps Engineer, you'll design, deploy, and maintain production-grade ML workflows across AWS and Azure using container orchestration, IaC, and CI/CD pipelines. You'll bridge DevOps and ML teams to automate model training, deployment, monitoring, and edge API services—ensuring reliable, scalable AI solutions.
Key Responsibilities:
- Architect and implement MLOps pipelines for ML model training, versioning, deployment, and monitoring using tools like MLflow, Kubeflow, SageMaker Pipelines, or Azure ML.
- Build and manage Infrastructure as Code (IaC) with Terraform for multi-cloud environments, including AWS EKS/ECS/Lambda and Azure AKS/Azure Functions.
- Design containerized applications with Docker and orchestrate them on Kubernetes (EKS/AKS) for high-availability ML inference and edge services.
- Develop CI/CD pipelines using Azure DevOps (ADO), GitHub Actions, AWS CodePipeline, or Azure Pipelines to automate deployments of Python/FastAPI microservices and Node.js backends.
- Create and optimize edge API applications (e.g., FastAPI-based services) for low-latency inference on AWS Lambda@Edge, Azure Functions, or ECS Fargate.
- Implement observability with Prometheus, Grafana, CloudWatch, Azure Monitor, and alerting for ML model drift, performance, and infrastructure health.
- Collaborate with data scientists and DevOps teams to productionize AI solutions, troubleshoot issues, and scale for high workloads.
- Write clean, production-ready code in Python, Node.js, and Bash for automation scripts, ETL processes, and API gateways.
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field.
- 4+ years of experience in DevOps/MLOps roles, with proven deployments on AWS and Azure.
- Expertise in:
- Cloud: AWS (EKS, ECS, Lambda, SageMaker, ECR) and Azure (AKS, Azure ML, Functions)
- IaC & Orchestration: Terraform, Docker, Kubernetes (EKS/AKS)
- Pipelines: Azure DevOps (ADO), Jenkins, GitLab CI, or AWS/Azure-native tools
- Programming: Python (FastAPI, Pandas, Scikit-learn), Node.js
- ML Ops: Model deployment, versioning, monitoring (e.g., Seldon, KServe)
- Hands-on experience building edge services and API applications for real-time inference.
- Strong problem-solving skills in multi-cloud environments.
Preferred Skills:
- Certifications: AWS Certified Machine Learning – Specialty, Azure AI Engineer Associate, CKA/CKAD, Terraform Associate.
- Experience with vector databases (Pinecone, FAISS), serverless ML, or GenAI fine-tuning.
- Knowledge of React.js for dashboarding ML metrics.
Click on Apply to know more.