Report

Conversational Video Interface Engineer

Salary

$160k - $250k

Min Experience

3 years

Location

San Francisco, CA, US, Remote

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

As a Conversational Video Interface Engineer at Tavus, you'll be at the forefront of developing our cutting-edge Conversational Video Interface (CVI). This role involves building and optimizing real-time, multimodal AI systems that enable lifelike digital twins to engage in natural conversations. You'll collaborate with cross-functional teams to enhance the CVI's capabilities, ensuring seamless integration of vision, speech, and emotional intelligence components. Your Mission 🚀 * Develop and Optimize CVI Components: Design and implement core components of the CVI, including WebRTC/video conferencing, vision processing, speech recognition (ASR), text-to-speech (TTS), and replica video output. * Integrate Multimodal AI Models: Work with in-house models like Phoenix-3 for lifelike avatar rendering, Sparrow-0 for conversational pacing, and Raven-0 for visual perception to create cohesive, responsive digital twins. * Ensure Low-Latency Performance: Optimize the CVI pipeline to achieve sub-600ms utterance-to-utterance latency, delivering real-time, natural conversations. * Collaborate with Cross-Functional Teams: Work closely with AI researchers, product managers, and UX designers to align technical development with user needs and product goals. * Maintain and Enhance API Infrastructure: Develop and maintain APIs that allow developers to easily integrate Tavus's CVI into their applications, ensuring scalability and reliability.

About the company

At Tavus, we're building the human layer of AI. Our mission is to make human-AI interaction as natural as face-to-face interaction, enabling the human touch where it has been previously unscalable. We achieve this through pioneering research in multi-modal AI models for human perception and understanding, combined with state-of-the-art human avatar rendering and communication models. Our models power everything from text-to-video AI avatars to real-time conversational video experiences across industries like healthcare, recruiting, sales, education, and more. By enabling AI to see, hear, and communicate with human-like authenticity, we're creating the foundation for the next generation of AI employees, assistants, and companions. We're a Series A company backed by top investors, including Sequoia, Y Combinator, and Scale VC. Join us in driving the future of human-AI interaction.

Skills

python

c++

javascript

tensorflow

pytorch

webrtc

asr

tts