Senior AI/ML Engineer

Boring Host AI

Location: India
Job type: Full-time

Required skills

Python
Autopilot
caching
compliance
Node
regression
SaaS
state management
TypeScript

About the role

Website: boringhost.ai
Job details:

Senior AI/ML Engineer (Boring Host AI)

About Boring Host

Boring Host AI is the intelligent operating layer above a property manager’s PMS. We unify guest conversations across channels (OTA inboxes, SMS, WhatsApp, email, and voice), ground answers in PMS truth, and automate the repetitive 80% with guardrails so teams scale without losing the “personal hospitality at scale” feel.

We are building AI that behaves like a great ops teammate: fast, consistent, correct, and humble when uncertain.

The role

We’re hiring a Senior AI Engineer to own core intelligence systems end to end: retrieval and knowledge, agent behavior, confidence gating, evaluations, and production reliability. You will work on real-world problems where “mostly works” is not good enough.

You will ship models and systems that directly touch guests, revenue, and reviews.

What you’ll build

A high-accuracy, PMS-grounded response engine that can answer guest questions with citations and context from listings, reservations, policies, and house rules
A confidence and risk gating system (autopilot only when safe) with fallback behaviors: draft, ask clarifying questions, or escalate to humans
A self-improving knowledge layer (RAG plus structured data) that learns from unknown inquiries, reduces repeated escalations, and avoids hallucinations
Production-grade LLM agent tooling: function calling, tool orchestration, conversation memory, and safe state management across channels
Evaluation harnesses that measure what matters: correctness, policy compliance, tone, latency, cost, and escalation rate
The “ops brain” behind revenue automation: upsell eligibility checks (availability, rules), offer generation, payment workflows, and task triggering
Voice intelligence foundations (if relevant): call summarization, intent extraction, action dispatching, and handoff logic

Responsibilities

Own AI features from prototype to production: design, implementation, testing, monitoring, iteration
Build and maintain retrieval pipelines and knowledge stores (structured + unstructured), including chunking, indexing, reranking, and freshness handling
Create robust evals and red-team suites using real operational edge cases (late check-in, lockouts, refunds, maintenance triage, policy disputes)
Optimize for reliability and cost: caching, prompt and context efficiency, latency targets, and graceful degradation
Partner with product and ops to translate messy hospitality reality into crisp AI behaviors and guardrails
Establish best practices for observability: traces, error taxonomies, confidence metrics, and feedback loops from human overrides

Must-have qualifications

3+ years building production ML/AI systems, including at least 2 years working with modern LLM stacks in real products
Strong Python skills and experience shipping services (APIs, workers, event-driven systems)
Deep understanding of RAG systems and information retrieval tradeoffs (precision vs recall, freshness, grounding, reranking)
Practical experience with evaluation and testing for LLMs (offline evals, online metrics, regression suites, A/B testing)
Proven track record of building reliable systems under ambiguity, with strong debugging instincts and ownership mindset

Nice-to-have qualifications

Experience with multi-channel messaging platforms, customer support automation, or hospitality / marketplace ops
Experience with voice agents or telephony workflows (intent, summarization, escalation)
Familiarity with vector databases, hybrid search, and knowledge graph concepts
Experience with TypeScript/Node in addition to Python (helpful for integrations)
Experience with privacy, compliance, and secure handling of customer data in SaaS systems

What success looks like (first 90 days)

You ship at least one major improvement to autopilot safety (confidence gating + eval coverage)
Hallucination-driven escalations drop measurably, and “wrong but confident” responses become rare
We gain a repeatable eval and release process that prevents regressions as we ship faster
The AI system becomes more grounded in PMS truth, with clearer failure modes and better handoffs

Why this is interesting

Hospitality ops is a perfect stress test for applied AI: multi-channel, time-sensitive, policy-heavy, emotionally charged, and full of edge cases. If you like building AI that actually works in the wild, this is your playground.

How to apply

A short note on what you’ve shipped in production (links if possible)
A resume or LinkedIn
One example of a hard reliability problem you solved (briefly: what broke, how you diagnosed it, how you fixed it)

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.