Website:
blessingsofttech.com
Job details:
REQUIREMENTS:
- Highly motivated, well-read, and a prodigy.
• Deep experience with LLM application development — prompt engineering, RAG pipelines, tool/
function calling, agent architectures
• Hands-on experience with at least 2 of: OpenAI, Anthropic, Google Gemini, Groq, Mistral APIs
• Strong understanding of embedding models, vector databases, and retrieval evaluation (precision,
recall, MRR, NDCG)
• Experience building evaluation frameworks for AI systems — not just accuracy metrics but
conversation-level quality assessment
• Python proficiency with async programming (asyncio, aiohttp)
• Familiarity with real-time audio/voice systems is a strong plus
• Experience with LangChain/LangGraph agent patterns is a strong plus
ROLE/RESPONSIBILITIES:
• Build and maintain the evaluation framework for voice and chat agent quality — hallucination rate, tool selection accuracy, conversation success metrics, retrieval precision/recall, and end-to-end task
completion rates
• Upgrade the RAG pipeline from basic FAISS flat index + bge-small-en-v1.5 to a production-grade
retrieval system with hybrid search (semantic + BM25), cross-encoder re-ranking, multi-document
support, chunk quality scoring, and dynamic index updates
• Design and implement LLM routing intelligence — choosing between 5 configured providers (OpenAI, Groq, Anthropic, Google Gemini, Mistral) based on query complexity, latency requirements, cost constraints, and tool-calling capability
• Harden the guardrails system beyond current regex + Llama Guard 3: add topic boundary
enforcement, PII detection/redaction, hallucination detection on RAG responses, and output quality
scoring
• Optimize voice pipeline latency end-to-end: STT TTFB, LLM TTFB, TTS TTFB, total round-trip. Profile each provider combination and tune VAD parameters (start/stop thresholds, confidence, min volume) per language
• Build prompt engineering infrastructure — version-controlled prompt registry, A/B testing framework for system prompts, and systematic optimization based on eval results
• Develop conversation analytics: real-time sentiment tracking, intent classification, conversation
outcome scoring, topic drift detection, and customer satisfaction prediction
• Implement human handoff intelligence — frustration detection, repeated failure patterns, scope-
boundary detection, handoff summary generation
Tech stack you will work with
• Pipecat AI (real-time voice pipeline with frame processors, VAD, barge-in)
• LangChain + LangGraph (chat agent executor, tool calling, multi-agent orchestration)
• FAISS + FastEmbed (vector search, local embeddings with BAAI/bge-small-en-v1.5)
• Deepgram Nova-3, Google Cloud STT, AssemblyAI (speech-to-text)
• ElevenLabs, Cartesia Sonic-3, Google TTS, Deepgram TTS (text-to-speech)
• OpenAI GPT-4o, Groq Llama 3.3-70B, Anthropic Claude, Google Gemini 2.0 Flash, Mistral Small
(LLMs)
• Llama Guard 3 via Groq (content safety), confusables library (homoglyph detection)
• MCP (Model Context Protocol) via Pipedream for external tool integration
Compensation:
CTC 10L ++
Click on Apply to know more.