Website:
mdsindia.in
Job details:
MDS's AI Companion is an AI-powered personal companion and productivity platform focused on intelligent human interaction, personalised guidance, reminders, scheduling, voice interaction, mood-aware engagement, and long-term contextual memory.
The AI layer is being designed as a scalable, production-grade system with:
- Conversational AI powered by Google Gemini (Flash + Pro)
- Custom RAG pipeline with pgvector for long-term memory
- Tiered model routing for cost optimisation at scale
- Real-time streaming AI responses over WebSocket
- Prompt-engineered persona — the "Wise Old Man" companion character
- Voice AI pipeline (Google Cloud STT/TTS + on-device wake word)
- Semantic search with curated source filtering
- AI-driven scheduling, intent detection, and mood-aware response tuning
- NestJS backend AI module development (TypeScript)
- Observability, token tracking, and cost monitoring
This is not a basic API wrapper or chatbot. We are building a long-term intelligent companion platform with complex AI orchestration, memory pipelines, and production-level engineering requirements.
What You May Work On: Depending on the role and experience level, selected developers may work on:
- Gemini Flash / Pro integration with tiered model routing
- Prompt engineering — persona design, system prompts, context window construction
- RAG pipeline — embedding, storage (pgvector), retrieval ranking, and context injection
- Conversation memory consolidation and semantic retrieval
- AI intent detection pipelines (NLU for notes, voice, tasks)
- Streaming response handling via WebSocket
- Voice AI pipeline integration (Google Cloud STT/TTS)
- NestJS AIModule, MemoryModule, and ChatModule development
- Token usage tracking, cost monitoring, and Gemini quota management
- AI observability and evaluation (response quality, hallucination detection)
- Search augmentation (Google Custom Search + YouTube Data API v3)
What We Are Looking For: We are looking for developers who:
- Have strong fundamentals in LLM APIs and prompt engineering
- Understand scalable AI pipeline architecture (not just prototyping)
- Can work independently with end-to-end ownership
- Are comfortable in startup-style, fast-execution environments
- Can debug production-level AI issues (context drift, latency, cost spikes)
- Have experience with RAG, vector databases, and embedding models
- Can adapt quickly to changing product requirements
- Prefer building intelligent long-term products rather than one-off integrations
Preferred Technical Experience:
- Google Gemini API / OpenAI API / Anthropic Claude API
- Prompt engineering and chain-of-thought / few-shot techniques
- RAG pipelines and vector databases (pgvector, Pinecone, Weaviate, etc.)
- Embedding models (Gemini Embeddings, text-embedding-3, etc.)
- NestJS / Node.js / TypeScript (backend AI module development)
- PostgreSQL with pgvector extension
- Redis / BullMQ for async AI job queuing
- Google Cloud STT / TTS or equivalent voice AI services
- LLM token management, cost optimisation, and rate limit handling
- AI observability — Sentry, Cloud Monitoring, LLM eval frameworks
Interview Process: The interview will include:
- Resume and AI project discussion
- Prompt engineering scenario walkthrough (live or take-home)
- RAG and memory architecture design discussion
- LLM cost optimisation problem-solving
- Scalability and production AI systems questions
- Live system design or prompt writing exercise (if applicable)
Please ensure:
- Stable internet connection
- Working microphone
- Resume updated before interview
- Ability to screen share if required
Click on Apply to know more.