Conversational AI Engineer — Voice AI & Real-Time Communication

Aivar Innovations

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Python
AWS
cross-functional
Docker
EC2
end-to-end
fintech
incident response
JS
Lambda
Node
TensorFlow
TypeScript
WebRTC
Pytorch

About the role

Website: aivar.tech
Job details:
About Aivar Innovations

Aivar Innovations is an AI-native services and software company and an AWS Preferred Partner, backed by Bessemer Venture Partners and Sorin Investments. We ship production-grade AI into enterprise environments across fintech, healthcare, and technology.

Our work is anchored by four accelerator platforms: Convogent (voice and agent AI automation), Velogent (governed agentic process automation for regulated industries), Kubogent (Kubernetes-native AIOps), and Datagent (data and analytics). We measure ourselves by what runs in production — not by slideware.

Role Overview

We are hiring a Conversational AI Engineer to architect and ship voice AI agents that handle thousands of live customer conversations daily. You will own real-time audio pipelines, telephony integrations, and the orchestration layer that ties STT, TTS, and LLMs into reliable sub-second voice experiences. This is a hands-on engineering role at the core of our Convogent accelerator.

What You'll Do

Build voice agents. Design and deploy scalable voice AI agents using Pipecat, LiveKit, or equivalent frameworks.
Engineer real-time audio pipelines. Optimize STT and TTS flows for low latency, high concurrency, and call quality.
Own telephony integration. Implement WebRTC, SIP, RTP, and WebSocket connectivity, and integrate with CPaaS platforms like Twilio and Telnyx for inbound and outbound calling.
Handle multi-speaker audio. Build speaker diarization and segmentation for accurate, context-aware conversations.
Build the orchestration layer. Connect STT, TTS, and LLM agents into resilient, observable workflows with guardrails and evaluation hooks.
Tune for production. Implement Voice Activity Detection, echo cancellation, codec optimization, and call quality monitoring.
Operate at scale. Monitor, log, and support production deployments on AWS — including incident response and latency tuning.
Partner with product and customer teams. Translate customer requirements into deployable voice solutions and iterate against measurable outcomes.

What You'll Bring

Hands-on experience building and deploying voice agents with Pipecat, LiveKit, or similar frameworks.
Deep expertise in STT (Deepgram, Whisper, AssemblyAI, Google STT, Speech) and TTS (ElevenLabs, Cartesia, Amazon Polly, TTS).
Strong working knowledge of telephony, SIP, WebRTC, and media servers (FreeSWITCH, Asterisk, Kamailio).
Experience with multi-speaker diarization and conversation segmentation.
Production fluency in Python plus at least one of Node.js, TypeScript, or Go.
Practical use of AI/ML libraries (TensorFlow, PyTorch, Scikit-learn) and conversational AI design — prompt engineering and integration with LLMs (GPT-4o, Claude, Bedrock models).
Audio codec depth (Opus, G.711, G.729), real-time streaming, and media handling.
Cloud deployment experience on AWS (Lambda, EC2, S3, EKS, Polly, Transcribe, Bedrock) with Docker, Kubernetes, and CI/CD.
Demonstrated track record of shipping voice AI or telephony systems to production with measurable customer impact — latency targets met, call volume handled, quality metrics moved.

Preferred Experience

Experience with media server architectures (SFU/MCU) and custom VoIP infrastructure (Asterisk, FreeSWITCH, Kamailio, OpenSIPS).
Familiarity with low-code voice platforms alongside custom infrastructure.
Voice analytics, transcription tooling, and Indian-language STT/TTS models.
AWS or speech technology certifications.
Experience leading cross-functional delivery and working with enterprise stakeholders.

Why This Role, Why Aivar

Work that ships. Your voice agents will run live on enterprise calls — measured by customer outcomes, not demos.
Backed and proven. AI-native, AWS Preferred Partner, BVP- and Sorin-backed, with 100+ enterprise customers in the first year.
Own the stack. End-to-end ownership from real-time audio to LLM orchestration to production operations. No layers between you and the system.
Build on accelerators that work. Convogent has real production traction — you extend a working platform, not start from zero.
Shape the roadmap. Direct collaboration with founders, product, and the AI engineering bench. Your field signal drives what gets built next.
Compensation. Competitive base, performance bonus, ESOPs, AWS/AI learning budget, and conference sponsorship.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.