Speech AI Data Annotation Specialist
Speech AI Data Annotation Specialist
About oto
oto is building high quality conversational speech datasets for next generation speech AI and speech to speech models. Our work focuses on natural two person conversations, including turn taking, backchannels, interruptions, fillers, pauses, and other real conversational behaviors that are important for human like voice AI.
We are looking for detail oriented annotators to help us prepare conversational audio data for speech AI research and model training.
About the role
This is a remote, paid contract role focused on high quality speech data annotation. You will work with short conversational audio assets in our custom annotation platform. Each asset is approximately 15 minutes long.
Your main task is to review pre labeled audio and correct the transcript, speech segments, and conversational labels based on detailed guidelines.
This is not a general writing job. It requires careful listening, attention to detail, and the ability to follow annotation instructions precisely.
Responsibilities
You may work on tasks such as:
- Correcting AI generated transcripts
- Adjusting speech segment start and end times
- Checking whether each speech segment matches the actual audio
- Labeling or reviewing conversational events such as backchannels, interruptions, fillers, restarts, pauses, and turn taking
- Fixing missing words, incorrect words, repetitions, and false starts
- Following detailed annotation guidelines
- Reporting unclear cases or platform issues
- Maintaining high accuracy while working efficiently
Expected pace
- After onboarding, we expect strong annotators to complete one 15 minute audio asset in about 1 hour while maintaining high annotation quality.
- Rates may go up to $50 per hour for annotators who consistently meet the required speed and quality level.
Compensation
Pay range: $25 to $50 per hour, depending on experience, speed, and annotation quality.
We will start with a short paid annotation trial. If the quality is strong, we would like to continue with ongoing work.
Requirements
- Native or near native English listening comprehension
- Strong attention to detail
- Ability to carefully follow written guidelines
- Comfortable working with natural conversational audio
- Comfortable reviewing AI generated transcripts and speech segments
- Reliable communication
- Ability to work independently in a remote environment
- Availability over the next 1 to 2 weeks is strongly preferred
Nice to have
- Experience with transcription, audio annotation, or speech data work
- Experience with AI training data or machine learning datasets
- Experience with linguistics, speech, NLP, or conversational AI
- Experience with quality assurance or review workflows
- Familiarity with natural conversation features such as backchannels, interruptions, fillers, pauses, and turn taking
Why this role may be interesting
You will contribute to a research oriented dataset for next generation speech AI systems. The work is directly related to how future voice AI models understand and respond to natural human conversation.
To apply, please include:
- Your native language and English accent background
- Your experience with transcription, annotation, speech data, linguistics, NLP, or AI data work
- Your weekly availability over the next 1 to 2 weeks
- Whether you are comfortable completing a short paid annotation trial
- Whether you believe you can complete a 15 minute audio asset in about 1 hour after onboarding