Protege
AI
ResearchScientist,Benchmarks&Evaluations
Neural analysis suggests this role is
optimal for Senior candidates.
“Research Scientist, Benchmarks & Evaluations at Protege. Skills: Design tasks and benchmarks, Validate evaluations rigorously, Develop the science of evals. Design tasks and benchmarks. Validate evaluations rigorously”
What You'll Achieve.
Shape the future of data and AI; Push the frontier forward; Publish on the questions that matter; Shape the eval datasets Protege delivers; Establish Protege as the standard-setter; Contribute to broader AI community understanding
Industry & Context.
Solve hard problems
What They're Looking For.
Must Have
Advanced degree in a quantitative field, Hands-on experience evaluating LLMs, agents, or other ML systems, Experience with annotator quality and inter-rater reliability, Excellent scientific writing and communication, Bias toward velocity
Nice to Have
PhD, Experience with RL evaluation techniques, Ability to navigate new customer architectures, data systems, and requirements quickly, Experience with latent-variable models of annotator skill, Track record of published benchmarks or evaluation papers
What You'll Do.
Design tasks and benchmarks
Validate evaluations rigorously
Develop the science of evals
Run evaluations on current frontier models
Translate findings into product
Partnering with outsourced annotation vendors
How You'll Work.
Team & Collaboration
Work closely with data and engineering teams; Collaboration
Communication Scope
Scientific writing; Communication; Synthesize technical findings into narratives
Full Job Description
Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech. We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. DataLab is Protege’s research arm — a team of research scientists committed to tackling the fundamental challenges and open questions regarding data for AI. We bridge the gap between research theory and data deployment to push the frontier forward, publishing on the questions that matter: what agentic AI should actually be trained to do, how to quality-control large-scale corpora, and how to build evaluation datasets that reflect the real world rather than the leaderboard. We’re a lean, fast-moving, high-trust team of builders who deeply care about scientific rigor and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. The Role Benchmarks decide what AI gets built. Today, most evals don’t measure what we actually care about — they’re contaminated, gameable, synthetic or measure capabilities that don’t transfer to the real tasks frontier models are deployed against. We’re hiring a Research Scientist to lead the design of benchmarks and evaluations that frontier labs, enterprises, and policymakers can actually trust. You’ll own the science of evaluation across DataLab — designing tasks that meaningfully separate models, validating t
Applying for this Research Scientist, Benchmarks & Evaluations role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Protege?
Real rants from real employees. Read before you apply.