Website:
go.link
Job details:
About the Role
We're building the infrastructure layer for next-generation AI — and we need engineers who care deeply about how GPUs are used, not just that they run. You'll work at the intersection of infrastructure and ML, solving real-world challenges around GPU utilization, distributed training, and developer experience.
Key Responsibilities
• Build and optimize GPU-backed compute environments for ML workloads
• Develop systems for provisioning and managing GPU resources
• Create and maintain containerized ML environments (PyTorch, TensorFlow, etc.)
• Improve performance of training and inference workloads
• Work on distributed training setups across multiple GPUs
• Build internal tools for monitoring, profiling, and debugging GPU usage
• Contribute to developer tooling (CLI / SDK) for interacting with the platform
Required Skills
• Strong experience with PyTorch / TensorFlow / JAX
• Hands-on with CUDA, NVIDIA drivers, cuDNN
• Experience with Docker and containerized environments
• Familiarity with Kubernetes or similar orchestration systems
• Strong Python skills
• Understanding of Linux systems and performance tuning
• Exposure to distributed systems or multi-GPU training
Nice to Have
• Experience with LLM fine-tuning or inference optimization
• Familiarity with tools like Ray, Horovod, or Dask
• Exposure to GPU monitoring / profiling tools
• Experience working with production ML systems
Location & Compensation
📍 Delhi NCR
💰 ₹15–20 LPA + ESOPs
Click on Apply to know more.