Voice AI Engineer

TechKareer

full-time

Required skills

Python
AWS
Azure
backend
end-to-end
fintech
GCP
JS
Node
state management
TypeScript
Websockets

About the role

Website: techkareer.com
Job details:

We're helping an AI startup based out of Delhi founded by IIT and IIM Alumni hire a Voice AI Engineer.

About the Role

We are looking for a Voice AI Engineer to build and improve real-time voice agents that can handle natural, reliable, human-like conversations over phone calls. You will work across speech, LLMs, telephony, backend systems, and agent orchestration to build production-grade AI calling experiences.

This role is ideal for someone who has hands-on experience building voice agents, understands the latency and reliability challenges of real-time audio, and can own systems end-to-end from prototype to production.

Responsibilities

Build and maintain real-time Voice AI agents for inbound and outbound calls.
Work with speech-to-text, text-to-speech, LLMs, and voice orchestration pipelines.
Integrate with telephony platforms such as Twilio, Plivo, Exotel, Vonage, or similar.
Design low-latency conversation flows with interruption handling, turn-taking, retries, fallbacks, and human handoff.
Improve agent quality across:
Latency
Accuracy
Naturalness
Conversation completion rate
User satisfaction
Build backend services for call handling, logging, analytics, and agent state management.
Integrate voice agents with CRMs, EMRs, scheduling tools, internal dashboards, databases, and APIs.
Create evaluation frameworks for voice agent performance, including call success rate, hallucination rate, escalation rate, and transcription quality.
Debug production issues related to audio streams, WebSockets, telephony failures, latency spikes, and LLM behavior.
Collaborate with product, design, and operations teams to ship reliable voice experiences for real users.

Requirements

1+ years of hands-on engineering experience.
Experience building Voice AI agents, conversational AI systems, or real-time audio applications.
Strong backend engineering skills in Python, Node.js, TypeScript, or Go.
Experience working with LLM APIs such as OpenAI, Anthropic, Gemini, or open-source models.
Familiarity with speech-to-text and text-to-speech systems such as Deepgram, AssemblyAI, Whisper, ElevenLabs, Cartesia, PlayHT, Azure Speech, Google Speech, or similar.
Experience with WebSockets, streaming APIs, async processing, and event-driven systems.
Understanding of real-time voice challenges such as:
Latency optimization
Barge-in / interruption handling
Silence detection
Voice activity detection
Turn detection
Audio quality issues
Call drops and retries
Ability to write clean, reliable, production-ready code.
Strong debugging skills and comfort working with ambiguous problems.

Good to Have

Experience with Twilio Media Streams, SIP, IVR systems, call routing, or PSTN infrastructure.
Experience building voice agents using frameworks such as LiveKit Agents, Vapi, Retell, Bland, Pipecat, Daily, or similar.
Experience with RAG, tool calling, agentic workflows, and structured outputs.
Experience with healthcare, fintech, logistics, customer support, sales, or operations voice workflows.
Experience building dashboards for call review, QA, analytics, and human-in-the-loop correction.
Knowledge of prompt engineering, conversation design, and LLM evaluation.
Experience deploying systems on AWS, GCP, Azure, Railway, Render, or similar.
Familiarity with compliance-sensitive environments such as HIPAA, SOC 2, or GDPR is a plus.

What You’ll Work On

AI agents that can make and receive phone calls.
Real-time conversation systems with low latency.
Voice workflows for scheduling, reminders, support, qualification, follow-ups, and data collection.
Integrations with internal tools and third-party platforms.
Evaluation systems to continuously improve agent reliability.
Infrastructure that can scale from demos to production usage.

Ideal Candidate

You are someone who has actually built voice agents or real-time conversational systems, not just chatbots. You understand that voice AI is different from text AI because users expect fast responses, smooth turn-taking, natural speech, and high reliability.

You are comfortable working across the full stack: audio streaming, LLMs, backend APIs, telephony, databases, and deployment. You enjoy debugging hard production issues and improving systems through fast iteration.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.