Lead AI Engineer

CarDekho

full-time

Required skills

Python
BigQuery
caching
CRM
data science
FastAPI
lead qualification
regression
spec
state management
statistics

About the role

CarDekho

Website: cardekho.com
Job details:

CarDekho Group is hiring a Lead GenAI Engineer to own the technical direction of two adjacent product surfaces:

OEM AI Solutions - agentic chatbot and assistant products we ship to automotive OEMs (Maruti, Hyundai, Tata, Mahindra, Kia, etc.) for lead qualification, customer support, dealer enablement, and post-sales engagement.
Consumer Product Experience - in-product AI features across CarDekho.com and the app: buying assistants, comparison agents, personalized recommendations, conversational search, and content generation embedded in the model/variant/dealer journey.

You will set the architecture, codify the eval and safety bar, and lead a small pod of GenAI engineers shipping production agents that are measurably better month-over-month, not limited to demos.

What you'll own -

Agentic systems and orchestration

Design and ship multi-agent systems for OEM and consumer flows: planner / executor / critic patterns, tool-using agents, and human-in-the-loop checkpoints for high-stakes turns (price quotes, lead capture, booking).
Build on Pydantic-AI as the orchestration layer — own the patterns for typed agents, dependency injection, structured outputs, and graph-style multi-step flows. Evolve the stack based on observability, latency, and cost trade-offs.
Build and maintain MCP servers that expose CarDekho's catalogue, pricing, dealer inventory, BigQuery analytics, and CRM as first-class tools for our agents and partner agents.

Models and prompting

Make build-vs-buy calls across the current frontier: Claude 4.x (Opus / Sonnet / Haiku), GPT-5, Gemini 2.5, Llama 4, Mistral, and reasoning-tier models. Match model to job — don't pay Opus prices for classification.
Apply prompt caching, batch APIs, structured outputs, and extended-thinking budgets to keep unit economics viable at OEM scale (millions of conversations/month).
Define and enforce prompt engineering standards: versioning, A/B testing, regression suites, and prompt-as-code reviews in PRs.

Evals, observability, and safety

Build the eval harness that gates every prompt, model, and agent change: golden sets, LLM-as-judge with calibrated rubrics, automated red-teaming, and online quality metrics tied to business KPIs (lead conversion, T2L, deflection, CSAT).
Stand up production observability (Langfuse / LangSmith / Arize / Helicone) traces, token spend per intent, latency p95s, tool-call success rates, hallucination flags.
Own the safety and guardrails layer: PII handling, jailbreak resistance, brand-safety filters per OEM, factuality checks on price/spec/availability, and escalation paths to human agents.

Multimodal

Use vision models for damage assessment, document understanding (RC, insurance), and image-grounded comparison features.

Leadership

Lead a pod of 3–6 GenAI / ML engineers. Set the technical bar, run design reviews, mentor on prompt craft and eval rigor, and unblock.
Partner deeply with Product, Design, Data Science, and OEM account teams — translate ambiguous business problems into agent specs with measurable success criteria.
Represent CarDekho in technical conversations with OEM CTOs and partner platforms; influence the AI roadmap shipped to millions of car buyers.

What we expect you to bring

Must-have

4+ years in ML / AI engineering, with 2+ years shipping LLM-powered products to production (not just notebooks or POCs).
Demonstrated ownership of at least one agentic system in production multi-step tool use, state management, recovery from tool failures, and a real eval story.
Strong Python; comfortable with async patterns, FastAPI / equivalent, and the modern GenAI stack (Pydantic-AI, LiteLLM, Pydantic, observability SDKs).
Hands-on with at least two of: Claude, GPT, Gemini, open-weights (Llama / Mistral / Qwen) — and informed opinions on when to use which.
Eval-driven mindset — you can describe how you'd prove a prompt change is better, not just feel that it is.
Solid statistics and ML fundamentals — you can read a calibration plot, design an A/B test, and reason about distribution shift.

Good to have

Experience with MCP (servers and clients), function-calling at scale, or agent-to-agent protocols.
Practical RAG / retrieval understanding — chunking strategy, embedding model choice, hybrid search (BM25 + dense), reranking, and how to debug "the agent can't find X" without guessing. Hands-on with at least one vector store (pgvector, Qdrant, Pinecone, Weaviate).
Voice / real-time stack experience (OpenAI Realtime, Deepgram, ElevenLabs, telephony integration, VAD, barge-in, sub-second TTS).
Worked with BigQuery / large analytical stores as agent tools, or have built text-to-SQL systems with guardrails.
Prior experience in automotive, marketplace, or high-intent consumer commerce — understanding lead funnels, dealer ecosystems, or large product catalogues.
Open-source contributions, papers, or talks in the agentic space.

How you work

You ship. You instrument. You iterate against numbers.
You push back with reasons when a roadmap item is the wrong bet, and you propose the alternative.
You write down decisions - eval results, model choices, failure modes.
You treat OEM trust and consumer safety as engineering constraints, not afterthoughts.

Why this role is interesting

Real scale: 30M+ monthly users on CarDekho consumer surfaces and direct OEM deployments that change how India buys cars.
A greenfield-but-not-toy mandate — existing data infra (BigQuery, GA4, the full inventory + leads stack) to build agents on top of, with budget and exec sponsorship to actually ship.
Genuine technical latitude on the agentic stack, with a sharp eval culture to keep us honest.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.