Flag job

Report

Senior Speech AI Engineer – On-Device ASR & Real-Time Pronunciation Intelligence

Location

Gurugram, Haryana, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Capital Numbers

Website: capitalnumbers.com
Job details:

We are looking for a Senior Speech AI Engineer to build production-grade, on-device Automatic Speech Recognition (ASR) and real-time speech intelligence systems.


In this role, you’ll work across the full speech AI lifecycle — from audio data pipelines and model development to low-latency streaming inference and edge deployment. You’ll help deliver accurate transcription, phoneme-level alignment, and real-time pronunciation feedback optimized for mobile and edge devices.


Key Skills & Experience

✔ 5–8+ years in Speech AI / Audio ML

✔ Strong Python & PyTorch expertise

✔ Experience with ASR models such as Whisper, Conformer, RNN-T, wav2vec 2.0, HuBERT

✔ Knowledge of speech processing & phoneme alignment

✔ Experience optimizing models for edge / mobile deployment (TensorFlow Lite, ONNX, PyTorch Mobile, CoreML)

✔ Familiarity with libraries like NVIDIA NeMo, ESPnet, SpeechBrain, torchaudio.


Nice to Have

• Experience with multilingual or low-resource ASR

• Work on pronunciation assessment or speech learning tools

• Experience with datasets such as Common Voice or LibriSpeech


If you’re passionate about building fast, accurate, and privacy-first speech AI systems, we’d love to connect.

Click on Apply to know more.

Skills

Python
TensorFlow
Pytorch
ONNX