Voice AI Engineer

VoxyHealth

Location: Chennai, Tamil Nadu, India
Job type: Full-time

Required skills

Python
AWS
API
Azure
backend
Docker
end-to-end
founder
JS
Node
WebRTC

About the role

Website: voxyhealth.ai
Job details:

Job Description: Voice AI Engineer

Company: VoxyHealth (VoxyAI)

Location: Chennai, India (On-site)

Level: Mid-to-Senior Individual Contributor

About VoxyHealth

VoxyHealth builds AI-powered voice and workflow automation for healthcare. Our platform connects EHR systems, automates front-desk workflows, and enables ambient clinical intelligence — helping healthcare providers spend less time on paperwork and more time on patients. We work with clients ranging from specialty clinics to national health systems.

The Role

We're hiring a voice AI engineer to build and scale the voice intelligence stack at the heart of our product—the real-time pipelines that listen, understand, and respond on behalf of healthcare providers. You'll work on the production voice agent: STT, LLM orchestration, TTS, telephony integration, latency tuning, and the hard edge cases that come with handling clinical conversations.

This is a hands-on engineering role for someone who's already shipped voice AI in production and wants to go deeper—not learn it for the first time.

What You'll Own

- Voice agent pipeline — end-to-end design and tuning of STT → LLM → TTS flows, including barge-in, turn-taking, and latency budgets

- Telephony integration — SIP/WebRTC, call routing, audio streaming, jitter/packet-loss handling

- Model integration — work with hosted and self-hosted models (Whisper, Deepgram, ElevenLabs, OpenAI Realtime, Claude, etc.); evaluate, swap, and tune

- Real-time performance — drive down end-to-end latency; profile and fix the slow path

- Conversation quality — prompt engineering, function/tool calling, dialog state, fallback handling

- Evaluation and observability — build the test harness and metrics that tell us when the agent regresses

- Healthcare-grade reliability — handle edge cases (silence, noise, accents, code-switching, interruptions) without the agent falling apart

What We're Looking For

Must-Have

- 5–7 years of software engineering experience overall, with 1–2 years specifically in voice AI (production, not side projects)

- Hands-on with at least one full voice pipeline: STT, TTS, and an LLM in the loop — you've debugged latency, barge-in, and turn-taking issues in production

- Telephony experience — SIP, WebRTC, or comparable real-time audio stacks; understands codecs, jitter buffers, and streaming audio

- Python backend proficiency; comfortable with Node.js as well

- Solid grasp of LLM orchestration — function/tool calling, streaming, prompt design, context management

- Comfortable in production: Docker, cloud (AWS or Azure), logs, traces, and metrics

- Understands the difference between a demo that works and a system that holds up under real call volume

Strong Plus

- Experience with Whisper, Deepgram, AssemblyAI, ElevenLabs, Cartesia, OpenAI Realtime API, LiveKit, Pipecat, or similar voice frameworks

- Healthcare domain knowledge — HIPAA, clinical workflows, EHR integrations

- Worked on multi-lingual or accent-robust voice systems

- ML/DL background — fine-tuning, evaluation harnesses, or custom model deployment

- Has shipped a voice agent that handled >1k concurrent calls or non-trivial production traffic

- Familiarity with Claude Code or similar AI-assisted development tools—we mandate their use across the engineering team

Mindset

- Obsessed with latency and conversation quality—"It works on my laptop" is not a finish line

- Ships and measures—every change is backed by an evaluation, not vibes

- Comfortable with messy real-world audio and the long tail of edge cases that come with it

- Bias toward production—you'd rather ship something behind a flag than perfect it offline

- Direct communicator; surfaces tradeoffs early

What You'll Work On (Examples)

- Cutting end-to-end agent response latency from 1.8s to under 800ms

- Building a barge-in handler that doesn't cut the user off mid-sentence

- Designing the eval harness that catches regressions before they hit a clinic

- Integrating a new TTS provider behind a feature flag and A/B testing voice quality

- Debugging why the agent loses context on long calls with accented speakers

Compensation

- Competitive base salary (commensurate with experience)

- Equity

- Health benefits

- On-site, Chennai

How We Hire

1. Intro call with Founder

2. Technical conversation — past voice AI work + system design (60 min)

3. Hands-on round — voice pipeline debugging or design exercise

4. Team / culture interview

5. Offer

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.