Generative AI Engineer Intern

VaultStack AI

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

LangChain
Python
API
code review
CUDA
Docker
end-to-end
Git
GPU
Helm
Kubernetes
React

About the role

Website: vaultstackai.com
Job details:

Company Overview

VaultStack AI is an enterprise AI infrastructure company specializing in fully on-premise Generative AI deployments. We architect and ship production-grade AI systems, agentic workflows, and LLM inference stacks that run entirely within a customer's environment. No external API calls. No data ever leaves the customer's environment.

Role

We are looking for a technically sharp and research-driven Generative AI Engineer Intern who can hit the ground running. You will own real engineering tasks, contribute to active enterprise deployments, and bring ideas from the rapidly evolving AI landscape directly into our stack, working on problems most AI practitioners never encounter.

What You'll Work On

Build and optimize end-to-end Inference pipelines, document ingestion, chunking, local embedding generation, hybrid retrieval, and response synthesis using only on-prem components
Deploy, configure, and tune open-source SLMs/LLMs (LLaMA, Mistral, Phi, Gemma, Qwen, DeepSeek) using local inference runtimes vLLM, HuggingFace TGI, llama.cpp
Build multi-agent and agentic systems with tool use, memory, and orchestration.
Benchmark models across accuracy, latency, throughput, and hardware efficiency for specific enterprise use cases
Track and prototype techniques from recent AI research and assess production viability
Integrate AI pipelines with enterprise data sources, databases, document stores, and internal APIs
Containerize and maintain AI services using Docker within on-prem infrastructure
Produce technical documentation architecture decisions, model evaluations, deployment runbooks

Required Skills & Qualifications

Education

Currently enrolled in a Bachelor's or Master's program in Computer Science, Electrical Engineering, or a related field or equivalent hands-on experience

On-Prem LLM Infrastructure

Experience deploying and serving LLMs locally using vLLM, HuggingFace TGI, or llama.cpp
Familiarity with the open-source model landscape and capability tradeoffs across model families
Understanding of model quantization GGUF, GPTQ, AWQ and selecting formats based on hardware constraints
Understanding of GPU memory management, KV cache, and batch inference optimization

Context-Augmented Inference & Retrieval Systems

Solid grasp of Context-Augmented Inference patterns hybrid search, re-ranking, contextual chunking, and query decomposition
Working knowledge of vector databases including indexing and performance tuning
Ability to design and evaluate context retrieval strategies tailored to enterprise document corpora and domain-specific knowledge bases

Agentic AI & Orchestration

Experience building agentic pipelines using LangChain, LlamaIndex, AutoGen, or CrewAI with local model backends
Understanding of tool use, function calling, and structured output with open-source models
Familiarity with MCP (Model Context Protocol) for connecting LLMs to tools and data sources

AI-Powered Development Tools

Proficiency with Cursor for AI-assisted, high-velocity development
Hands-on experience with Claude Code for agentic, multi-file coding tasks in the terminal

Research & Development

Active habit of following frontier AI research arXiv, HuggingFace, and publications from leading labs
Ability to read, implement, and prototype ideas from papers and evaluate production viability rapidly
Familiarity with recent advancements speculative decoding, MoE architectures, long-context modeling, reasoning models, multi-modal LLMs
Experience with systematic prompt engineering CoT, ReAct, Self-Consistency, Tree-of-Thought evaluated on local models
Strong experimental discipline hypothesis formulation, controlled evaluation, metric selection, and result interpretation
Ability to produce clear technical write-ups model comparisons, R&D findings, architecture decisions
Core Engineering
Strong Python proficiency clean, modular, production-quality code
Proficiency with Docker for containerized AI service deployment
Strong Git practices branching, code review, meaningful commits

Nice to Have

Experience with GPU cluster management and CUDA optimization
Exposure to Kubernetes or Helm for orchestrating multi-service AI deployments
Experience fine-tuning open-source models using LoRA, QLoRA with Axolotl or Unsloth
Familiarity with OpenWebUI or AnythingLLM as enterprise-facing AI interfaces
Prior exposure to air-gapped or compliance-sensitive environments (SOC 2, HIPAA, ISO 27001)
Active contributions to open-source AI projects or research communities
What We're Looking For
Ships fast, learns faster the AI landscape changes weekly and you keep up
Research-to-production mindset you don't just read papers, you ask "can we deploy this?"
Deeply curious you care about why a model behaves a certain way, not just that it does
Ownership mentality you treat your work like it's going into a Fortune 500 customer's infrastructure, because it is
Thrives in ambiguity comfortable defining the problem before solving it

What You'll Gain

Production-level experience building fully air-gapped, on-prem enterprise AI systems
Deep fluency in the open-source LLM ecosystem models, runtimes, optimization, and evaluation
Sharp R&D capability taking AI research from paper to prototype to production, fast
Mentorship from senior engineers on real, high-stakes enterprise deployments
Production mastery of Cursor and Claude Code as core engineering tools
A differentiated portfolio demonstrating enterprise-grade AI engineering
Potential for a Pre-Placement Offer (PPO) based on performance and impact

Internship Details

Duration: 3-6 months

Type: Full-time (preferred)

Stipend: Competitive

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.