Nucleus AI
Website:
withnucleus.ai
Job details:
At Nucleus, frontier AI depends on cloud infrastructure that is reliable, scalable, and built for constant evolution. We’re hiring a Software Engineer, Cloud Infrastructure to own the platforms, provisioning systems, and infrastructure-as-code that power Nucleus’s training and production workloads. This role sits close to the foundation of the company: building the cloud environments where models are trained, services are deployed, and critical systems run at scale.
You will work across cloud architecture, automation, and platform reliability—helping ensure that teams across research, product, and infrastructure can move quickly on top of durable, well-designed systems.
What you’ll do- Design, build, and operate the cloud platforms that support Nucleus training clusters and production systems.
- Own provisioning workflows and infrastructure-as-code used to manage compute, networking, storage, and secure service environments.
- Improve the reliability, scalability, and cost efficiency of cloud infrastructure across a range of workloads.
- Build automation for cluster lifecycle management, environment setup, and resource orchestration.
- Partner with research, platform, and product engineering teams to support evolving infrastructure needs.
- Strengthen observability, security, and operational controls across cloud systems and services.
- Help define standards for multi-environment deployments, access patterns, and cloud resource management.
- Contribute to incident response and long-term infrastructure improvements for mission-critical systems.
What we’re looking for- Strong experience building and operating cloud infrastructure in production environments.
- Deep familiarity with infrastructure-as-code tools such as Terraform, Pulumi, or similar systems.
- Experience with major cloud providers such as AWS, GCP, or Azure.
- Comfort managing compute, networking, storage, IAM, and service deployment at scale.
- Strong software engineering skills in languages such as Python, Go, or similar.
- Experience improving infrastructure reliability, automation, and operational efficiency.
- A systems mindset and sound judgment around performance, security, and maintainability.
- Interest in supporting AI workloads and the infrastructure demands of modern training and serving systems.
Why NucleusNucleus is building large-scale intelligent systems that require more than powerful models—they require cloud infrastructure that can support research velocity, production reliability, and long-term scale.
In this role, you’ll help shape the environments where our most important workloads live. Your work will influence how quickly we build, how safely we deploy, and how effectively we operate the infrastructure behind frontier AI. If you care about cloud architecture, automation, and building the foundations that ambitious teams depend on, we’d love to hear from you.
Click on Apply to know more.