AI Platform Engineer

InfiReex

Location: India
Job type: Full-time

Required skills

Ansible
automation tools
configuration management
cross-functional
CUDA
GPU
Jupyter Notebook
Kubeflow
Kubernetes
machine learning
TensorFlow
Terraform
Pytorch

About the role

InfiReex

Website: infireex.com
Job details:

We are looking for a skilled AI Platform Engineer to build, manage, and enhance AI/ML infrastructure, workflows, and automation pipelines. The role involves creating scalable platforms for training and deploying machine learning models using modern orchestration, automation, and GPU acceleration technologies. The ideal candidate will work closely with data scientists and platform engineering teams to enable efficient resource management and scalable operations across cloud and hybrid ecosystems.

Key Responsibilities

AI/ML Infrastructure & Kubernetes

Design, deploy, and manage Kubernetes environments optimized for AI/ML applications and workloads.
Ensure scalability, reliability, and performance of containerized AI platforms.

GPU Resource Management

Implement and manage GPU orchestration solutions such as Run:ai and related operators for workload scheduling and resource optimization.
Enable efficient GPU allocation and utilization for AI model training and inference.

Automation & Pipeline Development

Build and maintain Python-based automation tools and machine learning pipelines.
Automate infrastructure deployment using Terraform and manage configurations through Ansible.

Notebook Environment & Collaboration

Develop and maintain Jupyter Notebook environments to support experimentation, research, and collaborative model development.

NVIDIA Ecosystem Integration

Configure and optimize NVIDIA Enterprise Suite technologies including CUDA, NeMo Framework, Triton, TensorRT, and GPU drivers to support accelerated AI computing.

MLOps & Lifecycle Management

Implement MLOps standards and practices covering model lifecycle management, CI/CD pipelines, monitoring, and governance using tools such as MLflow and Kubeflow.

Cross-functional Collaboration

Partner with data scientists, ML engineers, and platform teams to improve scalability, operational efficiency, and resource utilization across cloud and hybrid infrastructures.

Required Skills & Experience

Strong programming expertise in Python with hands-on experience using ML frameworks such as TensorFlow and PyTorch.
Practical experience with Kubernetes and container orchestration technologies.
Familiarity with Run:ai or equivalent GPU workload scheduling platforms.
Strong experience in infrastructure automation using Terraform and configuration management using Ansible.
Experience working with Jupyter Notebooks in AI/ML development environments.
Good understanding of NVIDIA Enterprise Suite technologies including CUDA, NeMo Framework, Triton, and GPU drivers.
Knowledge of MLOps concepts, workflows, and tools such as MLflow and Kubeflow.
Experience deploying, managing, and scaling AI/ML workloads within cloud or hybrid infrastructure environments.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.