ML / Platform Engineer

ZAVO

Location: Connaught Place, Delhi, India
Job type: Full-time

Required skills

Python
CLI
CUDA
Docker
Kubernetes
Linux
Ray
TensorFlow
Pytorch

About the role

Website: go.link
Job details:

About the Role

We're building the infrastructure layer for next-generation AI — and we need engineers who care deeply about how GPUs are used, not just that they run. You'll work at the intersection of infrastructure and ML, solving real-world challenges around GPU utilization, distributed training, and developer experience.

Key Responsibilities

• Build and optimize GPU-backed compute environments for ML workloads

• Develop systems for provisioning and managing GPU resources

• Create and maintain containerized ML environments (PyTorch, TensorFlow, etc.)

• Improve performance of training and inference workloads

• Work on distributed training setups across multiple GPUs

• Build internal tools for monitoring, profiling, and debugging GPU usage

• Contribute to developer tooling (CLI / SDK) for interacting with the platform

Required Skills

• Strong experience with PyTorch / TensorFlow / JAX

• Hands-on with CUDA, NVIDIA drivers, cuDNN

• Experience with Docker and containerized environments

• Familiarity with Kubernetes or similar orchestration systems

• Strong Python skills

• Understanding of Linux systems and performance tuning

• Exposure to distributed systems or multi-GPU training

Nice to Have

• Experience with LLM fine-tuning or inference optimization

• Familiarity with tools like Ray, Horovod, or Dask

• Exposure to GPU monitoring / profiling tools

• Experience working with production ML systems

Location & Compensation

📍 Delhi NCR

💰 ₹15–20 LPA + ESOPs

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.