AI & ML Engineer

Blessing Softtech

Location: Pune City, Maharashtra, India
Job type: Full-time

Required skills

LangChain
Python
end-to-end
Google Cloud

About the role

Website: blessingsofttech.com
Job details:

REQUIREMENTS:

Highly motivated, well-read, and a prodigy.

• Deep experience with LLM application development — prompt engineering, RAG pipelines, tool/

function calling, agent architectures

• Hands-on experience with at least 2 of: OpenAI, Anthropic, Google Gemini, Groq, Mistral APIs

• Strong understanding of embedding models, vector databases, and retrieval evaluation (precision,

recall, MRR, NDCG)

• Experience building evaluation frameworks for AI systems — not just accuracy metrics but

conversation-level quality assessment

• Python proficiency with async programming (asyncio, aiohttp)

• Familiarity with real-time audio/voice systems is a strong plus

• Experience with LangChain/LangGraph agent patterns is a strong plus

ROLE/RESPONSIBILITIES:

• Build and maintain the evaluation framework for voice and chat agent quality — hallucination rate, tool selection accuracy, conversation success metrics, retrieval precision/recall, and end-to-end task

completion rates

• Upgrade the RAG pipeline from basic FAISS flat index + bge-small-en-v1.5 to a production-grade

retrieval system with hybrid search (semantic + BM25), cross-encoder re-ranking, multi-document

support, chunk quality scoring, and dynamic index updates

• Design and implement LLM routing intelligence — choosing between 5 configured providers (OpenAI, Groq, Anthropic, Google Gemini, Mistral) based on query complexity, latency requirements, cost constraints, and tool-calling capability

• Harden the guardrails system beyond current regex + Llama Guard 3: add topic boundary

enforcement, PII detection/redaction, hallucination detection on RAG responses, and output quality

scoring

• Optimize voice pipeline latency end-to-end: STT TTFB, LLM TTFB, TTS TTFB, total round-trip. Profile each provider combination and tune VAD parameters (start/stop thresholds, confidence, min volume) per language

• Build prompt engineering infrastructure — version-controlled prompt registry, A/B testing framework for system prompts, and systematic optimization based on eval results

• Develop conversation analytics: real-time sentiment tracking, intent classification, conversation

outcome scoring, topic drift detection, and customer satisfaction prediction

• Implement human handoff intelligence — frustration detection, repeated failure patterns, scope-

boundary detection, handoff summary generation

Tech stack you will work with

• Pipecat AI (real-time voice pipeline with frame processors, VAD, barge-in)

• LangChain + LangGraph (chat agent executor, tool calling, multi-agent orchestration)

• FAISS + FastEmbed (vector search, local embeddings with BAAI/bge-small-en-v1.5)

• Deepgram Nova-3, Google Cloud STT, AssemblyAI (speech-to-text)

• ElevenLabs, Cartesia Sonic-3, Google TTS, Deepgram TTS (text-to-speech)

• OpenAI GPT-4o, Groq Llama 3.3-70B, Anthropic Claude, Google Gemini 2.0 Flash, Mistral Small

(LLMs)

• Llama Guard 3 via Groq (content safety), confusables library (homoglyph detection)

• MCP (Model Context Protocol) via Pipedream for external tool integration

Compensation:

CTC 10L ++

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.