CarDekho
Website:
cardekho.com
Job details:
CarDekho Group is hiring a Lead GenAI Engineer to own the technical direction of two adjacent product surfaces:
- OEM AI Solutions - agentic chatbot and assistant products we ship to automotive OEMs (Maruti, Hyundai, Tata, Mahindra, Kia, etc.) for lead qualification, customer support, dealer enablement, and post-sales engagement.
- Consumer Product Experience - in-product AI features across CarDekho.com and the app: buying assistants, comparison agents, personalized recommendations, conversational search, and content generation embedded in the model/variant/dealer journey.
You will set the architecture, codify the eval and safety bar, and lead a small pod of GenAI engineers shipping production agents that are measurably better month-over-month, not limited to demos.
What you'll own -
Agentic systems and orchestration
- Design and ship multi-agent systems for OEM and consumer flows: planner / executor / critic patterns, tool-using agents, and human-in-the-loop checkpoints for high-stakes turns (price quotes, lead capture, booking).
- Build on Pydantic-AI as the orchestration layer — own the patterns for typed agents, dependency injection, structured outputs, and graph-style multi-step flows. Evolve the stack based on observability, latency, and cost trade-offs.
- Build and maintain MCP servers that expose CarDekho's catalogue, pricing, dealer inventory, BigQuery analytics, and CRM as first-class tools for our agents and partner agents.
Models and prompting
- Make build-vs-buy calls across the current frontier: Claude 4.x (Opus / Sonnet / Haiku), GPT-5, Gemini 2.5, Llama 4, Mistral, and reasoning-tier models. Match model to job — don't pay Opus prices for classification.
- Apply prompt caching, batch APIs, structured outputs, and extended-thinking budgets to keep unit economics viable at OEM scale (millions of conversations/month).
- Define and enforce prompt engineering standards: versioning, A/B testing, regression suites, and prompt-as-code reviews in PRs.
Evals, observability, and safety
- Build the eval harness that gates every prompt, model, and agent change: golden sets, LLM-as-judge with calibrated rubrics, automated red-teaming, and online quality metrics tied to business KPIs (lead conversion, T2L, deflection, CSAT).
- Stand up production observability (Langfuse / LangSmith / Arize / Helicone) traces, token spend per intent, latency p95s, tool-call success rates, hallucination flags.
- Own the safety and guardrails layer: PII handling, jailbreak resistance, brand-safety filters per OEM, factuality checks on price/spec/availability, and escalation paths to human agents.
Multimodal
- Use vision models for damage assessment, document understanding (RC, insurance), and image-grounded comparison features.
Leadership
- Lead a pod of 3–6 GenAI / ML engineers. Set the technical bar, run design reviews, mentor on prompt craft and eval rigor, and unblock.
- Partner deeply with Product, Design, Data Science, and OEM account teams — translate ambiguous business problems into agent specs with measurable success criteria.
- Represent CarDekho in technical conversations with OEM CTOs and partner platforms; influence the AI roadmap shipped to millions of car buyers.
What we expect you to bring
Must-have
- 4+ years in ML / AI engineering, with 2+ years shipping LLM-powered products to production (not just notebooks or POCs).
- Demonstrated ownership of at least one agentic system in production multi-step tool use, state management, recovery from tool failures, and a real eval story.
- Strong Python; comfortable with async patterns, FastAPI / equivalent, and the modern GenAI stack (Pydantic-AI, LiteLLM, Pydantic, observability SDKs).
- Hands-on with at least two of: Claude, GPT, Gemini, open-weights (Llama / Mistral / Qwen) — and informed opinions on when to use which.
- Eval-driven mindset — you can describe how you'd prove a prompt change is better, not just feel that it is.
- Solid statistics and ML fundamentals — you can read a calibration plot, design an A/B test, and reason about distribution shift.
Good to have
- Experience with MCP (servers and clients), function-calling at scale, or agent-to-agent protocols.
- Practical RAG / retrieval understanding — chunking strategy, embedding model choice, hybrid search (BM25 + dense), reranking, and how to debug "the agent can't find X" without guessing. Hands-on with at least one vector store (pgvector, Qdrant, Pinecone, Weaviate).
- Voice / real-time stack experience (OpenAI Realtime, Deepgram, ElevenLabs, telephony integration, VAD, barge-in, sub-second TTS).
- Worked with BigQuery / large analytical stores as agent tools, or have built text-to-SQL systems with guardrails.
- Prior experience in automotive, marketplace, or high-intent consumer commerce — understanding lead funnels, dealer ecosystems, or large product catalogues.
- Open-source contributions, papers, or talks in the agentic space.
How you work
- You ship. You instrument. You iterate against numbers.
- You push back with reasons when a roadmap item is the wrong bet, and you propose the alternative.
- You write down decisions - eval results, model choices, failure modes.
- You treat OEM trust and consumer safety as engineering constraints, not afterthoughts.
Why this role is interesting
- Real scale: 30M+ monthly users on CarDekho consumer surfaces and direct OEM deployments that change how India buys cars.
- A greenfield-but-not-toy mandate — existing data infra (BigQuery, GA4, the full inventory + leads stack) to build agents on top of, with budget and exec sponsorship to actually ship.
- Genuine technical latitude on the agentic stack, with a sharp eval culture to keep us honest.
Click on Apply to know more.