You are viewing a preview of this job. Log in or register to view more details about this job.

Freelance AI Evaluator

Freelance Expert AI Data Evaluator

Uber AI Solutions is Uber’s marketplace connecting elite freelancers with Generative AI researchers. We are assembling a specialized, highly analytical team to collaborate on frontier GenAI projects. This is a freelance, paid, remote opportunity designed for independent contractors who excel at complex critical thinking, high-level editorial review, and logical structuring.

In this role, you will define the standard for "perfection" in AI outputs across complex domains. You will be responsible for refining inputs and authoring the definitive benchmarks used to measure and elevate model performance.

What you’ll work on

Draft, review, and refine complex, multi-constraint prompts to ensure they are logically sound, precisely structured, and optimized for high-level AI reasoning.
Compose "golden responses" that serve as the authoritative target for AI performance, requiring exceptional reasoning, factual precision, and structural clarity.
Analyze model-generated responses against established quality guidelines to measure alignment with target expectations.
Identify logical fallacies, subtle nuances, and edge cases to ensure every response meets a supreme professional standard.

Project Details

Location: Remote but based in the United States.
Engagement: Freelance / Independent Contractor.
Schedule: Flexible, project-based hours.
Compensation: $110 / hr.

Who we’re looking for

Masterful analytical thinkers with a rare combination of intellectual curiosity and editorial precision.
Uncompromising logicians who can articulate the "why" behind a failed response with surgical accuracy.
Visionaries who don't just follow standards, but have the deep-tier experience to set them.
Experts who can navigate the nuances of high-stakes language without relying on AI assistance to bridge the gap.
Ability to deconstruct complex arguments and identify subtle inconsistencies or logical gaps across diverse subjects.
Adept at distilling highly technical or abstract concepts into clear, perfectly organized, and authoritative formats.
High standard of written communication and can define the specific criteria that separate an "average" response from a "perfect" one.
At least two years of experience in high-level AI/LLM annotation, evaluation, or specialized content review.

Why this matters

Your oversight will directly shape the trajectory of frontier AI models. By defining the ultimate standard of quality and bridging the gap between raw computation and flawless logic, you ensure that tomorrow's AI is safe, reliable, and fundamentally trustworthy for real-world application.