Website:
purplemerit.com
Job details:
Are you obsessed with turning AI prototypes into production-grade agent systems that actually scale? We're a fast-moving startup building the next generation of autonomous AI agents, and we need someone who lives and breathes LLM orchestration, GPU optimization, and shipping real products not demos.
Company Snapshot
We're a venture-backed AI startup redefining how businesses deploy autonomous agents. No red tape. No legacy tech. Just sharp minds building the future. If you want ownership, impact, and the freedom to experiment (and break things), you'll fit right in.
The Role
Most "AI Engineers" out there can wrap an API and call it a day. We don't need that.
We are building autonomous agents that do real work. This means you won't be tuning prompts in a Jupyter notebook. You will be wrestling with GPU memory, optimizing vLLM inference, containerizing sandboxes, and debugging why an agent decided to hallucinate a loop at 2 AM.
If you want to build toys, this is not the right place to you. If you want to build the infrastructure that powers the future of AI.
You’ll architect, deploy, and optimize agentic systems that handle complex decision-making workflows at scale. This is a hands-on role where you’ll own the full lifecycle—from spinning up open-source models on GPU clusters to containerizing agents and monitoring them in production. that runs 24/7, makes real decisions.
Key Responsibilities
- Design and orchestrate multi-agent workflows, decision pipelines, and tool-use architectures
- Deploy and optimize open-source LLMs on GPU infrastructure with efficient caching and memory management
- Build robust agent frameworks using LangChain, LlamaIndex, or custom orchestration layers
- Containerize agents and services with Docker; manage sandboxed execution environments for secure AI operations
- Implement Redis caching, vector DB retrieval (Pinecone/Weaviate), and memory-efficient loading for cost-effective inference
- Implement production-grade CI/CD, monitoring, and logging for agentic systems
- Optimize latency, throughput, and cost for high-volume inference
- Rapidly prototype new agent patterns and iterate based on real-world performance data
Who You Are (Must-Haves)
- Minimum 2-3+ years of experience building and deploying AI/ML systems in production (not just notebooks)
- Proven experience running open-source LLMs on GPU (CUDA, PyTorch, vLLM, TensorRT)
- Strong grasp of caching strategies (Redis, vector DBs) batching and memory optimization for large models,
- Hands-on with Docker/containerization and sandboxing techniques for secure AI execution
- Built or contributed to agent architectures (ReAct, multi-agent, tool-use patterns)
- Python expert with solid software engineering practices (FastAPI, async, microservices)
- Thrive in ambiguity you identify problems and ship solutions without waiting for specs
- Fast learner who tracks the latest in LLM/agent ecosystems daily
- "Young blood" mindset: bold thinking, relentless execution, zero ego, curious.
- Experience with Kubernetes and cloud-native MLOps (AWS/GCP/Azure)
- Knowledge of RAG pipelines, vector databases (Pinecone, Weaviate), and embedding optimization
- Familiarity with model fine-tuning (LoRA, QLoRA) and evaluation frameworks
- Built multi-modal agents or integrated APIs/tools at scale
If your profile matches, you'll hear from us within 24–48 hours check your inbox (and spam folder).
Click on Apply to know more.