Flag job

Report

Audio Media Annotation Specialist

Min Experience

0 years

Location

remote

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Experience what it is like to annotate audio based media at Mercor. Description Details Description details add to the "richness" of a description. Be specific: Instead of "music plays," say "upbeat pop song with female vocals" Instead of "noise," say "distant barking dog" Characterize the sound that you hear. Include qualities such as how near/far away the audio originates, sound quality, sound texture, and any other relevant information. Use complete sentences and natural language when writing captions. Speech Identify who is speaking if possible (e.g., "A woman...") Transcribe exact words if clear. If not, describe them (e.g., "muffled shouting") Unintelligible speech is not speech - it is considered noise Add voice/emotion descriptors if obvious (tone, pitch, speed, emotion) Use if the character(s) in the scene could hear it; if it's a noise overlay (e.g., an overlay of an audience clapping). Use for non-English speech (do not translate or transcribe) ✅ [00:06-00:08] The woman (S1), on-screen, names more South Asian dishes in a neutral tone, at a moderate pace. For example: ✅ [00:01-00:04] The man says "Come here!" in a loud, excited voice. Speech Examples: Golden Example 1 ✅ N/A Golden Example 2 ✅ [00:01-00:06] A woman with a medium-pitched, fast-paced North American accent, slightly soothing in tone and distorted over a loudspeaker, announces: "Flight 307 from Midland … [brief pause] … is now unloading. Passengers may claim their baggage at the counter." Note: The indistinguishable words in the background of the clip are not captioned in this section. Noise Capture what makes the sound, where it comes from, and how it sounds. Prioritize noticeable sounds; faint hums may be skipped unless important. Use if the character(s) in the scene could hear it; if it's a noise overlay (e.g., an overlay of an audience clapping). For example: ✅ [00:05-00:07] A car horn blares loudly off-screen. Noise Examples: Golden Example 1 ✅ [00:00-00:01] A gas stove clicks three times as it lights, creating a sharp, metallic ignition sound. [00:03-00:04] Water pours from one container to another at a high volume, followed immediately by a hand wiping a surface, producing a crisp, brushing sound. [00:08-00:09] Food cooks in a hot skillet, creating a loud sizzling sound with a metallic edge. [00:09-00:10] A utensil scrapes the bottom of a pot, producing a low-pitched scrape, followed immediately by food sizzling softly at medium volume. Golden Example 2 ✅ [00:00-00:01] Brief fluttery electronic thump noise pops in the background. Alternative: [00:00-00:01] A brief, static-like burst resembling an electronic power-up sound effect. [00:08-00:09] Muffled, indistinguishable words are spoken in the background. [00:09-00:10] Loud thump as the clip ends. Note: The indistinguishable dialogue is captioned here, and not in Speech. Music Use if the character(s) in the scene could hear it; if it's a noise overlay (e.g., an overlay of an audience clapping). For example: (happening in the scene, e.g., radio playing) (background soundtrack) Include details: genre, instruments, mood, and lyrics (if clear) Use if lyrics are not in English. For example: ✅ [00:00-00:14] A moderate-paced romantic Chinese pop ballad plays at a moderate volume. Music Examples: Golden Example 1 ✅ [00:00-00:14] A fast-tempo easy-listening, piano melody plays with a percussive beat of claps. The sound is soft but upbeat, adding lively energy to the food preparation scene. The music ends with a dramatic note. Golden Example 2 ✅ N/A

About the company

Mercor is an AI-powered media annotation platform that helps companies unlock the value of their unstructured data. Our mission is to make the world's multimedia content more accessible, searchable, and usable.

Skills

audio annotation
audio description
multimedia annotation