Website:
boringhost.ai
Job details:
Senior AI/ML Engineer (Boring Host AI)
About Boring Host
Boring Host AI is the intelligent operating layer above a property manager’s PMS. We unify guest conversations across channels (OTA inboxes, SMS, WhatsApp, email, and voice), ground answers in PMS truth, and automate the repetitive 80% with guardrails so teams scale without losing the “personal hospitality at scale” feel.
We are building AI that behaves like a great ops teammate: fast, consistent, correct, and humble when uncertain.
The role
We’re hiring a Senior AI Engineer to own core intelligence systems end to end: retrieval and knowledge, agent behavior, confidence gating, evaluations, and production reliability. You will work on real-world problems where “mostly works” is not good enough.
You will ship models and systems that directly touch guests, revenue, and reviews.
What you’ll build
- A high-accuracy, PMS-grounded response engine that can answer guest questions with citations and context from listings, reservations, policies, and house rules
- A confidence and risk gating system (autopilot only when safe) with fallback behaviors: draft, ask clarifying questions, or escalate to humans
- A self-improving knowledge layer (RAG plus structured data) that learns from unknown inquiries, reduces repeated escalations, and avoids hallucinations
- Production-grade LLM agent tooling: function calling, tool orchestration, conversation memory, and safe state management across channels
- Evaluation harnesses that measure what matters: correctness, policy compliance, tone, latency, cost, and escalation rate
- The “ops brain” behind revenue automation: upsell eligibility checks (availability, rules), offer generation, payment workflows, and task triggering
- Voice intelligence foundations (if relevant): call summarization, intent extraction, action dispatching, and handoff logic
Responsibilities
- Own AI features from prototype to production: design, implementation, testing, monitoring, iteration
- Build and maintain retrieval pipelines and knowledge stores (structured + unstructured), including chunking, indexing, reranking, and freshness handling
- Create robust evals and red-team suites using real operational edge cases (late check-in, lockouts, refunds, maintenance triage, policy disputes)
- Optimize for reliability and cost: caching, prompt and context efficiency, latency targets, and graceful degradation
- Partner with product and ops to translate messy hospitality reality into crisp AI behaviors and guardrails
- Establish best practices for observability: traces, error taxonomies, confidence metrics, and feedback loops from human overrides
Must-have qualifications
- 3+ years building production ML/AI systems, including at least 2 years working with modern LLM stacks in real products
- Strong Python skills and experience shipping services (APIs, workers, event-driven systems)
- Deep understanding of RAG systems and information retrieval tradeoffs (precision vs recall, freshness, grounding, reranking)
- Practical experience with evaluation and testing for LLMs (offline evals, online metrics, regression suites, A/B testing)
- Proven track record of building reliable systems under ambiguity, with strong debugging instincts and ownership mindset
Nice-to-have qualifications
- Experience with multi-channel messaging platforms, customer support automation, or hospitality / marketplace ops
- Experience with voice agents or telephony workflows (intent, summarization, escalation)
- Familiarity with vector databases, hybrid search, and knowledge graph concepts
- Experience with TypeScript/Node in addition to Python (helpful for integrations)
- Experience with privacy, compliance, and secure handling of customer data in SaaS systems
What success looks like (first 90 days)
- You ship at least one major improvement to autopilot safety (confidence gating + eval coverage)
- Hallucination-driven escalations drop measurably, and “wrong but confident” responses become rare
- We gain a repeatable eval and release process that prevents regressions as we ship faster
- The AI system becomes more grounded in PMS truth, with clearer failure modes and better handoffs
Why this is interesting
Hospitality ops is a perfect stress test for applied AI: multi-channel, time-sensitive, policy-heavy, emotionally charged, and full of edge cases. If you like building AI that actually works in the wild, this is your playground.
How to apply
- A short note on what you’ve shipped in production (links if possible)
- A resume or LinkedIn
- One example of a hard reliability problem you solved (briefly: what broke, how you diagnosed it, how you fixed it)
Click on Apply to know more.