Agentic AI Engineer

PurpleMerit

Location: India
Job type: Full-time

Required skills

LangChain
Python
AWS
API
Azure
caching
containerization
CUDA
FastAPI
GCP
GPU
Jupyter Notebook
Kubernetes
microservices
prototypes
React
Redis
specs
Pytorch

About the role

Website: purplemerit.com
Job details:

Are you obsessed with turning AI prototypes into production-grade agent systems that actually scale? We're a fast-moving startup building the next generation of autonomous AI agents, and we need someone who lives and breathes LLM orchestration, GPU optimization, and shipping real products not demos.

Company Snapshot

We're a venture-backed AI startup redefining how businesses deploy autonomous agents. No red tape. No legacy tech. Just sharp minds building the future. If you want ownership, impact, and the freedom to experiment (and break things), you'll fit right in.

The Role

Most "AI Engineers" out there can wrap an API and call it a day. We don't need that.

We are building autonomous agents that do real work. This means you won't be tuning prompts in a Jupyter notebook. You will be wrestling with GPU memory, optimizing vLLM inference, containerizing sandboxes, and debugging why an agent decided to hallucinate a loop at 2 AM.

If you want to build toys, this is not the right place to you. If you want to build the infrastructure that powers the future of AI.

You’ll architect, deploy, and optimize agentic systems that handle complex decision-making workflows at scale. This is a hands-on role where you’ll own the full lifecycle—from spinning up open-source models on GPU clusters to containerizing agents and monitoring them in production. that runs 24/7, makes real decisions.

Key Responsibilities

Design and orchestrate multi-agent workflows, decision pipelines, and tool-use architectures
Deploy and optimize open-source LLMs on GPU infrastructure with efficient caching and memory management
Build robust agent frameworks using LangChain, LlamaIndex, or custom orchestration layers
Containerize agents and services with Docker; manage sandboxed execution environments for secure AI operations
Implement Redis caching, vector DB retrieval (Pinecone/Weaviate), and memory-efficient loading for cost-effective inference
Implement production-grade CI/CD, monitoring, and logging for agentic systems
Optimize latency, throughput, and cost for high-volume inference
Rapidly prototype new agent patterns and iterate based on real-world performance data

Who You Are (Must-Haves)

Minimum 2-3+ years of experience building and deploying AI/ML systems in production (not just notebooks)
Proven experience running open-source LLMs on GPU (CUDA, PyTorch, vLLM, TensorRT)
Strong grasp of caching strategies (Redis, vector DBs) batching and memory optimization for large models,
Hands-on with Docker/containerization and sandboxing techniques for secure AI execution
Built or contributed to agent architectures (ReAct, multi-agent, tool-use patterns)
Python expert with solid software engineering practices (FastAPI, async, microservices)
Thrive in ambiguity you identify problems and ship solutions without waiting for specs
Fast learner who tracks the latest in LLM/agent ecosystems daily
"Young blood" mindset: bold thinking, relentless execution, zero ego, curious.
Experience with Kubernetes and cloud-native MLOps (AWS/GCP/Azure)
Knowledge of RAG pipelines, vector databases (Pinecone, Weaviate), and embedding optimization
Familiarity with model fine-tuning (LoRA, QLoRA) and evaluation frameworks
Built multi-modal agents or integrated APIs/tools at scale

If your profile matches, you'll hear from us within 24–48 hours check your inbox (and spam folder).

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.