AI Research Engineer

YO IT Consulting

full-time

Required skills

Python
AWS
backend
computer vision
deep learning
Fusion
GCP
machine learning
NumPy
Pandas
statistics
TensorFlow
Pytorch
ONNX

About the role

Website: yoitconsulting.com
Job details:
AI Research Engineer - Audio & Vision AI

Location: South Asia (Remote)

Job Type: Full-Time | Permanent

Experience: 3-7 Years

Salary: upto 35 LPA

Must-Haves

Preferred candidates who have completed Bachelor's or Master's from IITs
3+ years of hands-on experience in Machine Learning / Deep Learning (PyTorch, TensorFlow)
Strong mathematical foundation in signal processing, time-series analysis, and statistics
Proven experience with audio or visual data — music, speech, motion, or similar perceptual domains
Familiarity with MIR (Music Information Retrieval) or Computer Vision tasks such as:

Pitch detection
Beat tracking
Timbre classification
Speech analysis
Pose estimation
Gesture recognition
Motion tracking

Experience with model optimization and deployment (TorchScript, ONNX, TensorRT)
Strong Python skills and familiarity with:

NumPy
pandas
Librosa
Essentia
OpenCV
MediaPipe

Stable career history with minimum 3 years of stability in an organization

About The Role

As an AI Research Engineer, you’ll design and develop machine learning systems capable of understanding and evaluating human performance — starting with sound (music, speech) and later expanding into vision (movement, gestures).

You’ll collaborate with Audio/Vision Engineers, Backend Developers, and Subject Matter Experts (SMEs) such as musicians, dancers, and coaches to transform expert intuition into measurable, scalable AI-driven feedback systems.

This role sits at the intersection of AI, creativity, education, and real-time human interaction.

Key Responsibilities

Research, design, and train AI models for real-time performance evaluation across domains
Implement and optimize deep learning architectures including CNNs, RNNs, and Transformers
Build scalable pipelines for clean, real-time feature extraction from audio and vision data
Collaborate with SMEs to define “performance quality” metrics and label datasets
Develop evaluation frameworks to compare AI predictions with expert feedback
Experiment with cross-modal fusion (audio + vision) for synchronized analysis
Optimize models for low-latency inference on web/mobile devices using ONNX, TensorRT, and TF Lite
Document research findings, prototype outcomes, and contribute to internal knowledge sharing

Nice to Have

Research or published work in Audio AI, Multimodal AI, or Performance Evaluation
Experience building real-time ML inference systems
Background in music, performing arts, or educational AI
Familiarity with AWS, GCP, MLflow, or DVC
Strong curiosity and creativity in experimenting with human-centered AI

What You’ll Achieve

Shape the foundation of an AI Learning Intelligence Platform
Transform expert artistic insights into measurable AI systems
Build models that can listen, see, and guide learners globally
Contribute to AI systems spanning music, dance, public speaking, and chess

Why Join Us

Work at the intersection of AI, Art, and Education
Collaborate with passionate technologists and subject matter experts
Creative freedom to explore cutting-edge AI models
Build products with meaningful global impact
Competitive compensation, equity opportunities, and strong career visibility

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.