Novus Hi-Tech
Website:
novushitech.com
Job details:
🚀 Hiring | DevOps / Cloud Infrastructure Engineer | Novus Hi-Tech Robotics
We are looking for a highly driven DevOps / Cloud Infrastructure Engineer to architect and scale mission-critical infrastructure supporting AI training, MLOps, Digital Twin environments, and large-scale monitoring systems.
🔹 About the Role
Training world-class Physical AI models requires a unique infrastructure ecosystem — massive GPU fleets, ultra-high-throughput storage, distributed computing, and secure sovereign deployments. You will play a foundational role in building and scaling this global infrastructure platform.
🔹 Key Responsibilities
- GPU Fleet Management: Architect and manage large-scale compute clusters, optimizing for performance, cost, and graceful failure handling.
- Distributed Computing: Deploy and scale frameworks for distributed training and inference across heterogeneous environments.
- Sovereign Infrastructure: Design "air-gapped" versions of our platform that can run entirely on-premises for privacy-conscious customers.
- Observability: Build comprehensive monitoring and alerting for complex ML workloads.
🔹 Key Qualifications
- Cloud & Orchestration: Expert-level knowledge of major cloud providers and container orchestration at scale.
- Distributed Systems: Proficiency in managing distributed training frameworks and high-throughput storage solutions.
- Automation: Mastery of Infrastructure-as-Code and modern CI/CD practices.
- Security: Deep understanding of network security and private/sovereign infrastructure design.
- 3–5 years of experience with Bachelor’s or Master’s Degree Computer Science Engineering / Artificial Intelligence / Data Engineering
🔹 Ideal Background
We are looking for engineers who have worked on:
• Large-scale AI/ML infrastructure platforms
• GPU-intensive compute environments
• Cloud-native distributed systems
• High-performance data infrastructure and MLOps ecosystems
• Secure enterprise or air-gapped deployments
Click on Apply to know more.