About the role
Our Scalability and Capability Inference team is responsible for building and maintaining the critical systems that serve our LLMs to a diverse set of consumers. As the cornerstone of our service delivery, the team focuses on scaling inference systems, ensuring reliability, optimizing compute resource efficiency, and developing new inference capabilities. The team tackles complex distributed systems challenges across our entire inference stack, from optimal request routing to efficient prompt caching.
You may be a good fit if you:
Have significant software engineering experience
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work
Strong candidates may also have experience with:
High performance, large-scale distributed systems
Implementing and deploying machine learning systems at scale
LLM optimization batching and caching strategies
Kubernetes
Python
Representative projects:
Optimizing inference request routing to maximize compute efficiency
Autoscaling our compute fleet to effectively match compute supply with inference demand
Contributing to new inference features (e.g. structured sampling, fine tuning)
Supporting inference for new model architectures
Ensuring smooth and regular deployment of inference services
Analyzing observability data to tune performance based on production workloads
About the company
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.