eBay for Business
Website:
ebay.com
Job details:
At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.
Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.
Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.
About EBay AI Platform
At eBay, we are building a next-generation AI platform to power intelligent, AI-driven experiences across our global marketplace. Our platform supports the full lifecycle of large-scale foundation models—from distributed pretraining on high-performance GPU clusters to high-throughput production inference—enabling commerce intelligence for hundreds of millions of users worldwide.
We focus on building state-of-the-art AI runtime infrastructure leveraging vLLM and TensorRT-LLM as pluggable inference engines behind a standardized AI runtime layer, alongside Megatron-LM and DeepSpeed for distributed training—integrated with provisioned throughput management, a distributed KV cache, prefill/decode disaggregation, and a robust MLOps stack spanning experiment management, fine-tuning automation, and production observability.
About The Role
We are looking for an experienced Software Engineer specializing in AI runtimes and MLOps to design and operate the systems that bring eBay's foundation models from research to production. You will own the inference runtime stack, the distributed training infrastructure, and the MLOps tooling that ties them together—enabling ML researchers and Applied Scientists to move fast without sacrificing reliability or performance.
You will work on production LLM/VLM inference serving with vLLM and TensorRT-LLM via a standardized AI runtime layer, implement distributed inference optimizations including prefill/decode disaggregation, distributed KV cache management, and LLM-aware request routing—develop large-scale distributed training pipelines using Megatron-LM and DeepSpeed on high-performance GPU clusters—and build the MLOps stack that automates the end-to-end model lifecycle.
Key Responsibilities
- Build and operate production AI inference runtimes using vLLM and TensorRT-LLM behind a standardized AI runtime layer.
- Implement and optimize distributed inference architectures with prefill/decode disaggregation.
- Design and optimize a distributed KV cache system across nodes.
- Develop and optimize large-scale distributed training pipelines using Megatron-LM and DeepSpeed.
- Profile and resolve distributed training bottlenecks using NVIDIA and PyTorch performance tools.
- Implement inference optimizations such as quantization, speculative decoding, continuous batching, and FlashAttention.
- Build and operate an Inference Request Router for authentication, routing, and throughput management.
- Develop and operate multi-LoRA adapter hosting with hot-swap routing and lifecycle management.
- Build and maintain the MLOps stack, including experiment tracking, model versioning, automated evaluation, and CI/CD.
- Develop and operate fine-tuning pipelines such as SFT, RLHF, DPO, and LoRA.
- Build fault-tolerant distributed training infrastructure with checkpointing, failure detection, and recovery.
- Build regression testing and benchmarking systems to improve training and inference performance.
What We’re Looking For
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 5+ years of experience building distributed systems or ML platform infrastructure.
- Strong programming skills in Python and/or Golang.
- Familiarity with CUDA or Triton kernel development is a plus.
- Hands-on experience deploying and operating LLM inference engines such as vLLM, TensorRT-LLM, NVIDIA Triton, or SGLang.
- Deep understanding of LLM inference internals, including KV cache management, PagedAttention, continuous batching, and request routing.
- Experience building or optimizing distributed training pipelines using Megatron-LM, DeepSpeed, FSDP, or equivalent frameworks.
- Strong understanding of model parallelism strategies and their trade-offs.
- Proficiency with NVIDIA tooling such as NCCL, DCGM, Nsight Systems, and PyTorch Profiler.
- Experience implementing inference optimizations including quantization, speculative decoding, FlashAttention, and multi-LoRA serving.
- Experience building MLOps workflows including experiment tracking, model registry, evaluation automation, and CI/CD.
- Experience developing fine-tuning pipelines such as SFT, RLHF, DPO, or LoRA at scale.
- Strong expertise in Kubernetes and containerized GPU environments.
- Strong debugging and performance optimization skills across CUDA runtimes, distributed training, and ML serving systems.
Additional Details
eBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at talent@ebay.com. We will make every effort to respond to your request for accommodation as soon as possible. View our accessibility statement to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.
We use cookies to enhance your experience and may use AI tools for administrative tasks in the hiring process. To learn how we handle your personal data and use AI responsibly, please visit our Talent Privacy Notice, Privacy Center, and AI Hiring Guidelines.
Click on Apply to know more.