AI SW Stack Deployment Architect

Sandisk

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

compiler
cross-functionally
end-to-end
Fusion
GPU
TensorFlow
Pytorch

About the role

Sandisk

Website: sandisk.com
Job details:
Company Description

Sandiskunderstands how people and businesses consume data and we relentlessly innovate to deliver solutions that enable today’s needs and tomorrow’s next big ideas. With a rich history of groundbreaking innovations in Flash and advanced memory technologies, our solutions have become the beating heart of the digital world we’re living in and that we have the power to shape.

Sandiskmeets people and businesses at the intersection of their aspirations and the moment, enabling them to keep moving and pushing possibilityforward. We do this through the balance of our powerhouse manufacturing capabilities and our industry-leading portfolio of products that are recognized globallyforinnovation, performance and quality.

Sandiskhas two facilities recognized by the World EconomicForum as part of the Global Lighthouse Networkforadvanced 4IR innovations. These facilities were also recognized as Sustainability Lighthousesforbreakthroughs in efficient operations. With our global reach, we ensure the global supply chain has access to the Flash memory it needs to keep our world movingforward.

Job Description

Role Overview

We are looking for a Software Architect (12+ years experience) to lead the application/framework layer and deployment stack for the Next Generation Accelerator AI platform. This role owns how models run on Next Generation Accelerator—from vLLM / PyTorch / TensotFlow/XLA to production deployment—ensuring correctness, performance, and scalability.

Key Responsibilities

Architect integration of vLLM, PyTorch, and TensorFlow, JAX/XLA into Next Generation Accelerator stack
Define framework → compiler → runtime APIs and contracts
Own LLM execution behavior (batching, KV cache, streaming inference)
Design and implement end-to-end deployment workflows (packaging, versioning, reproducibility)
Drive performance optimization across model → framework → runtime
Work cross-functionally with compiler, runtime, and low-level SW teams
Support customer workloads, model onboarding, and debugging

Impact

Own customer-visible AI execution and deployment on Next Generation Accelerator, closing the gap between models and system performance, and enabling enterprise-grade AI solutions

Qualifications

Required Qualifications

10+ years in AI/ML systems or software architecture
Strong experience with PyTorch / Transformers / LLMs
Hands-on experience with LLM deployment and scalable inference engine systems e.g. vLLM, Triton, SGLang etc.
Experience building scalable AI platforms (cloud/edge)
Expertise in system design, APIs, and cross-layer integration

Preferred Qualifications

Experience with vLLM or similar LLM serving systems
Familiarity with XLA / MLIR / compiler frameworks
Exposure to AI accelerators (GPU/NPU) and runtime systems

Experience in distributed or multi-agent AI systems

Additional Information

Sandisk thrives on the power and potential of diversity. As a global company, we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees, our company, our customers, and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging, respect and contribution.

Sandisk is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us atjobs.accommodations@sandisk.comto advise us of your accommodation request. In your email, please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying. Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.