You are viewing a preview of this job. Log in or register to view more details about this job.

Speech AI Data Annotation Specialist

About oto

oto is building high quality conversational speech datasets for next generation speech AI and speech to speech models. Our work focuses on natural two person conversations, including turn taking, backchannels, interruptions, fillers, pauses, and other real conversational behaviors that are important for human like voice AI.

We are looking for detail oriented annotators to help us prepare conversational audio data for speech AI research and model training.

About the role

This is a remote, paid contract role focused on high quality speech data annotation. You will work with short conversational audio assets in our custom annotation platform. Each asset is approximately 15 minutes long.

Your main task is to review pre labeled audio and correct the transcript, speech segments, and conversational labels based on detailed guidelines.

This is not a general writing job. It requires careful listening, attention to detail, and the ability to follow annotation instructions precisely.

Responsibilities

You may work on tasks such as:

Correcting AI generated transcripts
Adjusting speech segment start and end times
Checking whether each speech segment matches the actual audio
Labeling or reviewing conversational events such as backchannels, interruptions, fillers, restarts, pauses, and turn taking
Fixing missing words, incorrect words, repetitions, and false starts
Following detailed annotation guidelines
Reporting unclear cases or platform issues
Maintaining high accuracy while working efficiently

Expected pace

After onboarding, we expect strong annotators to complete one 15 minute audio asset in about 1 hour while maintaining high annotation quality.
Rates may go up to $50 per hour for annotators who consistently meet the required speed and quality level.

Compensation

Pay range: $25 to $50 per hour, depending on experience, speed, and annotation quality.

We will start with a short paid annotation trial. If the quality is strong, we would like to continue with ongoing work.

Requirements

Native or near native English listening comprehension
Strong attention to detail
Ability to carefully follow written guidelines
Comfortable working with natural conversational audio
Comfortable reviewing AI generated transcripts and speech segments
Reliable communication
Ability to work independently in a remote environment
Availability over the next 1 to 2 weeks is strongly preferred

Nice to have

Experience with transcription, audio annotation, or speech data work
Experience with AI training data or machine learning datasets
Experience with linguistics, speech, NLP, or conversational AI
Experience with quality assurance or review workflows
Familiarity with natural conversation features such as backchannels, interruptions, fillers, pauses, and turn taking

Why this role may be interesting

You will contribute to a research oriented dataset for next generation speech AI systems. The work is directly related to how future voice AI models understand and respond to natural human conversation.

To apply, please include:

Your native language and English accent background
Your experience with transcription, annotation, speech data, linguistics, NLP, or AI data work
Your weekly availability over the next 1 to 2 weeks
Whether you are comfortable completing a short paid annotation trial
Whether you believe you can complete a 15 minute audio asset in about 1 hour after onboarding