Report

AI Researcher & Engineer - Multimodal (Audio)

Location

Palo Alto, CA

JobType

full-time

About the job

Info This job is sourced from a job board

Overview

About the role

The multimodal team at xAI creates AI experiences beyond text, enabling understanding and generation of content across image, video, and audio. The role involves driving the model’s multimodal audio capability through data, modeling, serving, and product collaboration. Responsibilities include advancing multimodal audio capabilities (audio understanding and generation), improving data quality, developing data filtering/generation techniques, conducting data studies, creating evaluation frameworks and benchmarks, and designing algorithms for state-of-the-art audio model performance. Ideal candidates have a track record in neural network research, experience in data-driven experiment design, large-scale distributed machine learning systems, and a strong focus on delivering excellent end-to-end user experience. The role is based in the Bay Area (San Francisco and Palo Alto), with candidates expected to be located near or open to relocation. The tech stack includes Python, Jax, and Rust. The interview process includes an initial phone interview followed by four technical interviews and a project deep-dive presentation. All interviews are conducted via Google Meet.

Skills

Python

end-to-end

machine learning

Rust