Senior Speech AI Engineer – On-Device ASR & Real-Time Pronunciation Intelligence

Capital Numbers

Location: Gurugram, Haryana, India
Job type: Full-time

Required skills

Python
TensorFlow
Pytorch
ONNX

About the role

Capital Numbers

Website: capitalnumbers.com
Job details:

We are looking for a Senior Speech AI Engineer to build production-grade, on-device Automatic Speech Recognition (ASR) and real-time speech intelligence systems.

In this role, you’ll work across the full speech AI lifecycle — from audio data pipelines and model development to low-latency streaming inference and edge deployment. You’ll help deliver accurate transcription, phoneme-level alignment, and real-time pronunciation feedback optimized for mobile and edge devices.

Key Skills & Experience

✔ 5–8+ years in Speech AI / Audio ML

✔ Strong Python & PyTorch expertise

✔ Experience with ASR models such as Whisper, Conformer, RNN-T, wav2vec 2.0, HuBERT

✔ Knowledge of speech processing & phoneme alignment

✔ Experience optimizing models for edge / mobile deployment (TensorFlow Lite, ONNX, PyTorch Mobile, CoreML)

✔ Familiarity with libraries like NVIDIA NeMo, ESPnet, SpeechBrain, torchaudio.

Nice to Have

• Experience with multilingual or low-resource ASR

• Work on pronunciation assessment or speech learning tools

• Experience with datasets such as Common Voice or LibriSpeech

If you’re passionate about building fast, accurate, and privacy-first speech AI systems, we’d love to connect.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.