Software Engineer, Model Inference

Nucleus AI

Location: India
Job type: Full-time

Required skills

Python
backend
C++
caching
capacity planning
end-to-end
Java
Rust

About the role

Nucleus AI

Website: withnucleus.ai
Job details:

The usefulness of an intelligent system depends not only on what it can do, but on how reliably, efficiently, and quickly it can do it in the real world. We’re hiring a Software Engineer, Model Inference to build and optimize the systems that serve Nucleus models in production. This role focuses on the core infrastructure behind inference at scale: reducing latency, increasing throughput, improving reliability, and driving down cost without compromising quality.

You’ll work at the intersection of systems, performance, and product—shaping the serving layer that turns frontier models into dependable user and developer experiences. The problems are deeply technical, highly leveraged, and central to how Nucleus delivers intelligence in practice.

In this role, you will

Design and build inference systems that serve Nucleus models reliably in production across a range of workloads and traffic patterns.
Optimize end-to-end serving performance across latency, throughput, tail behavior, utilization, and cost.
Improve model serving architectures, request scheduling, batching, caching, and resource allocation strategies.
Build tooling and observability for profiling, debugging, and capacity planning across inference services.
Partner with research, infrastructure, and product engineering teams to productionize new model capabilities efficiently and safely.
Drive improvements in autoscaling, rollout safety, resilience, and fault tolerance across serving systems.
Help define the abstractions and platforms that make high-performance inference easier to operate and evolve.

You may be a good fit if you

Have strong software engineering experience in backend systems, distributed systems, or performance-critical infrastructure.
Have worked on model serving, inference platforms, real-time systems, or large-scale production services.
Are comfortable reasoning about latency, throughput, queuing, scheduling, and hardware utilization.
Write strong production code in languages such as Python, Go, Rust, C++, or Java.
Enjoy performance tuning and like moving fluidly between systems design, implementation, and operational improvement.
Care about building infrastructure that is both technically rigorous and deeply useful to downstream teams and users.

What makes Nucleus different

At Nucleus, inference is not just an optimization problem—it is a core product and platform capability. The systems you build will shape how our models are experienced in production, and how efficiently we can bring new capabilities to users. We care deeply about technical excellence because performance, reliability, and cost are part of the product.

If you want to help define the serving layer for frontier AI, we’d love to hear from you.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.