Website:
voxyhealth.ai
Job details:
Job Description: Voice AI Engineer
Company: VoxyHealth (VoxyAI)
Location: Chennai, India (On-site)
Level: Mid-to-Senior Individual Contributor
About VoxyHealth
VoxyHealth builds AI-powered voice and workflow automation for healthcare. Our platform connects EHR systems, automates front-desk workflows, and enables ambient clinical intelligence — helping healthcare providers spend less time on paperwork and more time on patients. We work with clients ranging from specialty clinics to national health systems.
The Role
We're hiring a voice AI engineer to build and scale the voice intelligence stack at the heart of our product—the real-time pipelines that listen, understand, and respond on behalf of healthcare providers. You'll work on the production voice agent: STT, LLM orchestration, TTS, telephony integration, latency tuning, and the hard edge cases that come with handling clinical conversations.
This is a hands-on engineering role for someone who's already shipped voice AI in production and wants to go deeper—not learn it for the first time.
What You'll Own
- Voice agent pipeline — end-to-end design and tuning of STT → LLM → TTS flows, including barge-in, turn-taking, and latency budgets
- Telephony integration — SIP/WebRTC, call routing, audio streaming, jitter/packet-loss handling
- Model integration — work with hosted and self-hosted models (Whisper, Deepgram, ElevenLabs, OpenAI Realtime, Claude, etc.); evaluate, swap, and tune
- Real-time performance — drive down end-to-end latency; profile and fix the slow path
- Conversation quality — prompt engineering, function/tool calling, dialog state, fallback handling
- Evaluation and observability — build the test harness and metrics that tell us when the agent regresses
- Healthcare-grade reliability — handle edge cases (silence, noise, accents, code-switching, interruptions) without the agent falling apart
What We're Looking For
Must-Have
- 5–7 years of software engineering experience overall, with 1–2 years specifically in voice AI (production, not side projects)
- Hands-on with at least one full voice pipeline: STT, TTS, and an LLM in the loop — you've debugged latency, barge-in, and turn-taking issues in production
- Telephony experience — SIP, WebRTC, or comparable real-time audio stacks; understands codecs, jitter buffers, and streaming audio
- Python backend proficiency; comfortable with Node.js as well
- Solid grasp of LLM orchestration — function/tool calling, streaming, prompt design, context management
- Comfortable in production: Docker, cloud (AWS or Azure), logs, traces, and metrics
- Understands the difference between a demo that works and a system that holds up under real call volume
Strong Plus
- Experience with Whisper, Deepgram, AssemblyAI, ElevenLabs, Cartesia, OpenAI Realtime API, LiveKit, Pipecat, or similar voice frameworks
- Healthcare domain knowledge — HIPAA, clinical workflows, EHR integrations
- Worked on multi-lingual or accent-robust voice systems
- ML/DL background — fine-tuning, evaluation harnesses, or custom model deployment
- Has shipped a voice agent that handled >1k concurrent calls or non-trivial production traffic
- Familiarity with Claude Code or similar AI-assisted development tools—we mandate their use across the engineering team
Mindset
- Obsessed with latency and conversation quality—"It works on my laptop" is not a finish line
- Ships and measures—every change is backed by an evaluation, not vibes
- Comfortable with messy real-world audio and the long tail of edge cases that come with it
- Bias toward production—you'd rather ship something behind a flag than perfect it offline
- Direct communicator; surfaces tradeoffs early
What You'll Work On (Examples)
- Cutting end-to-end agent response latency from 1.8s to under 800ms
- Building a barge-in handler that doesn't cut the user off mid-sentence
- Designing the eval harness that catches regressions before they hit a clinic
- Integrating a new TTS provider behind a feature flag and A/B testing voice quality
- Debugging why the agent loses context on long calls with accented speakers
Compensation
- Competitive base salary (commensurate with experience)
- Equity
- Health benefits
- On-site, Chennai
How We Hire
1. Intro call with Founder
2. Technical conversation — past voice AI work + system design (60 min)
3. Hands-on round — voice pipeline debugging or design exercise
4. Team / culture interview
5. Offer
Click on Apply to know more.