Flag job

Report

DevOps or Cloud Infrastructure Engineer

Location

Gurugram, Haryana, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Novus Hi-Tech

Website: novushitech.com
Job details:

🚀 Hiring | DevOps / Cloud Infrastructure Engineer | Novus Hi-Tech Robotics


We are looking for a highly driven DevOps / Cloud Infrastructure Engineer to architect and scale mission-critical infrastructure supporting AI training, MLOps, Digital Twin environments, and large-scale monitoring systems.


🔹 About the Role

Training world-class Physical AI models requires a unique infrastructure ecosystem — massive GPU fleets, ultra-high-throughput storage, distributed computing, and secure sovereign deployments. You will play a foundational role in building and scaling this global infrastructure platform.


🔹 Key Responsibilities

  • GPU Fleet Management: Architect and manage large-scale compute clusters, optimizing for performance, cost, and graceful failure handling.
  • Distributed Computing: Deploy and scale frameworks for distributed training and inference across heterogeneous environments.
  • Sovereign Infrastructure: Design "air-gapped" versions of our platform that can run entirely on-premises for privacy-conscious customers.
  • Observability: Build comprehensive monitoring and alerting for complex ML workloads.


🔹 Key Qualifications

  • Cloud & Orchestration: Expert-level knowledge of major cloud providers and container orchestration at scale.
  • Distributed Systems: Proficiency in managing distributed training frameworks and high-throughput storage solutions.
  • Automation: Mastery of Infrastructure-as-Code and modern CI/CD practices.
  • Security: Deep understanding of network security and private/sovereign infrastructure design.
  • 3–5 years of experience with Bachelor’s or Master’s Degree Computer Science Engineering / Artificial Intelligence / Data Engineering


🔹 Ideal Background

We are looking for engineers who have worked on:

• Large-scale AI/ML infrastructure platforms

• GPU-intensive compute environments

• Cloud-native distributed systems

• High-performance data infrastructure and MLOps ecosystems

• Secure enterprise or air-gapped deployments

Click on Apply to know more.

Skills

Artificial Intelligence
cloud infrastructure
DevOps
GPU
infrastructure-as-code
network security