Project Director – HPC / AI / GPU Infrastructure Deployment

Larsen & Toubro-Vyoma

full-time

Required skills

CUDA
end-to-end
ethernet
GPU
PMP
PRINCE2
virtualization

About the role

Website: larsentoubrovyoma.com
Job details:

Role Overview

We are seeking a seasoned Project Director – HPC / AI Infrastructure Deployment to lead large-scale, high-density compute programs involving GPU clusters, HPC workloads, and AI infrastructure. The role demands end-to-end ownership of deploying 10+ MW IT load data center environments, ensuring delivery of high-performance GPU-based compute platforms with cutting-edge networking and storage architectures.

Roles & Responsibilities

Lead and deliver large-scale HPC / AI GPU cluster deployments (e.g., NVIDIA B200 / B300 GPU platforms) within defined timelines and budgets
Drive execution of AI stack deployment (e.g., NVIDIA NVAIE) across hybrid/cloud/on-prem environments
Manage multi-vendor ecosystems including OEMs, SI partners, and hyperscale technology providers
Deploy and scale high-density GPU racks with liquid/air-cooled thermal strategies
Design and oversee InfiniBand (IB) and high-speed Ethernet networks
Experience with NVIDIA/Mellanox InfiniBand fabrics
Configuration and optimization using UFM (Unified Fabric Manager)
Strong understanding of BCM (Broadcom Ethernet switching) platforms
Architect and implement Leaf-Spine network topology for ultra-low latency AI workloads
Ensure effective integration of storage systems (parallel file systems, NVMe-based storage)
Oversee deployment of Kubernetes-based GPU orchestration platforms
Experience with containerized AI workloads and distributed training clusters
Exposure to NVIDIA AI Enterprise (NVAIE), CUDA, and GPU virtualization frameworks
Manage data center design, build, and repurposing for HPC workloads
Oversee MEP (Mechanical, Electrical, Plumbing) systems implementation
Enure optimized thermal management (liquid cooling, rear door heat exchangers, immersion cooling where applicable)
Ensure optimized power density (kW/rack) planning
Ensure optimized energy efficiency (PUE optimization)
Establish robust governance frameworks aligned to:

a. HLD/LLD design validation

b. SOP adherence

c. Quality assurance benchmarks

Implement risk mitigation strategies for large-scale deployments (supply chain, OEM dependencies, technology integration risks)
Monitor program milestones and ensure SLA-based deliveries
Drive structured cabling design (fiber-heavy HPC fabric, spine-leaf connectivity)

Qualifications & Experience

B.E/B.Tech in Electrical / Electronics / Computer Science Engineering
15–25 years of experience in Data center infrastructure deployment, HPC / AI workload environments, large-scale IT infrastructure programs

Mandatory / Preferred Certifications

PMP / PRINCE2 (mandatory for program governance)
CDCP / CDCS / CDCPM certifications

Strongly preferred:

NVIDIA AI Infrastructure / DGX / AI Factory certifications
OEM certifications (Dell, HPE, Lenovo HPC systems)

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.