Website:
vaultstackai.com
Job details:
Company Overview
VaultStack AI is an enterprise AI infrastructure company specializing in fully on-premise Generative AI deployments. We architect and ship production-grade AI systems, agentic workflows, and LLM inference stacks that run entirely within a customer's environment. No external API calls. No data ever leaves the customer's environment.
Role
We are looking for a technically sharp and research-driven Generative AI Engineer Intern who can hit the ground running. You will own real engineering tasks, contribute to active enterprise deployments, and bring ideas from the rapidly evolving AI landscape directly into our stack, working on problems most AI practitioners never encounter.
What You'll Work On
- Build and optimize end-to-end Inference pipelines, document ingestion, chunking, local embedding generation, hybrid retrieval, and response synthesis using only on-prem components
- Deploy, configure, and tune open-source SLMs/LLMs (LLaMA, Mistral, Phi, Gemma, Qwen, DeepSeek) using local inference runtimes vLLM, HuggingFace TGI, llama.cpp
- Build multi-agent and agentic systems with tool use, memory, and orchestration.
- Benchmark models across accuracy, latency, throughput, and hardware efficiency for specific enterprise use cases
- Track and prototype techniques from recent AI research and assess production viability
- Integrate AI pipelines with enterprise data sources, databases, document stores, and internal APIs
- Containerize and maintain AI services using Docker within on-prem infrastructure
- Produce technical documentation architecture decisions, model evaluations, deployment runbooks
Required Skills & Qualifications
Education
Currently enrolled in a Bachelor's or Master's program in Computer Science, Electrical Engineering, or a related field or equivalent hands-on experience
On-Prem LLM Infrastructure
- Experience deploying and serving LLMs locally using vLLM, HuggingFace TGI, or llama.cpp
- Familiarity with the open-source model landscape and capability tradeoffs across model families
- Understanding of model quantization GGUF, GPTQ, AWQ and selecting formats based on hardware constraints
- Understanding of GPU memory management, KV cache, and batch inference optimization
Context-Augmented Inference & Retrieval Systems
- Solid grasp of Context-Augmented Inference patterns hybrid search, re-ranking, contextual chunking, and query decomposition
- Working knowledge of vector databases including indexing and performance tuning
- Ability to design and evaluate context retrieval strategies tailored to enterprise document corpora and domain-specific knowledge bases
Agentic AI & Orchestration
- Experience building agentic pipelines using LangChain, LlamaIndex, AutoGen, or CrewAI with local model backends
- Understanding of tool use, function calling, and structured output with open-source models
- Familiarity with MCP (Model Context Protocol) for connecting LLMs to tools and data sources
AI-Powered Development Tools
- Proficiency with Cursor for AI-assisted, high-velocity development
- Hands-on experience with Claude Code for agentic, multi-file coding tasks in the terminal
Research & Development
- Active habit of following frontier AI research arXiv, HuggingFace, and publications from leading labs
- Ability to read, implement, and prototype ideas from papers and evaluate production viability rapidly
- Familiarity with recent advancements speculative decoding, MoE architectures, long-context modeling, reasoning models, multi-modal LLMs
- Experience with systematic prompt engineering CoT, ReAct, Self-Consistency, Tree-of-Thought evaluated on local models
- Strong experimental discipline hypothesis formulation, controlled evaluation, metric selection, and result interpretation
- Ability to produce clear technical write-ups model comparisons, R&D findings, architecture decisions
- Core Engineering
- Strong Python proficiency clean, modular, production-quality code
- Proficiency with Docker for containerized AI service deployment
- Strong Git practices branching, code review, meaningful commits
Nice to Have
- Experience with GPU cluster management and CUDA optimization
- Exposure to Kubernetes or Helm for orchestrating multi-service AI deployments
- Experience fine-tuning open-source models using LoRA, QLoRA with Axolotl or Unsloth
- Familiarity with OpenWebUI or AnythingLLM as enterprise-facing AI interfaces
- Prior exposure to air-gapped or compliance-sensitive environments (SOC 2, HIPAA, ISO 27001)
- Active contributions to open-source AI projects or research communities
- What We're Looking For
- Ships fast, learns faster the AI landscape changes weekly and you keep up
- Research-to-production mindset you don't just read papers, you ask "can we deploy this?"
- Deeply curious you care about why a model behaves a certain way, not just that it does
- Ownership mentality you treat your work like it's going into a Fortune 500 customer's infrastructure, because it is
- Thrives in ambiguity comfortable defining the problem before solving it
What You'll Gain
- Production-level experience building fully air-gapped, on-prem enterprise AI systems
- Deep fluency in the open-source LLM ecosystem models, runtimes, optimization, and evaluation
- Sharp R&D capability taking AI research from paper to prototype to production, fast
- Mentorship from senior engineers on real, high-stakes enterprise deployments
- Production mastery of Cursor and Claude Code as core engineering tools
- A differentiated portfolio demonstrating enterprise-grade AI engineering
- Potential for a Pre-Placement Offer (PPO) based on performance and impact
Internship Details
Duration: 3-6 months
Type: Full-time (preferred)
Stipend: Competitive
Click on Apply to know more.