Protege

MachineLearningResearcher-RLandAgenticSystems

Remote FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Machine Learning Researcher - RL and Agentic Systems at Protege. Skills: Reinforcement Learning, Agentic Systems, Data Evaluation, Benchmarking. Design and build datasets, tasks, environments, and evaluation assets. Translate real-world workflows into structured tasks”

What You'll Achieve.

Establish a evaluation baseline; Create clear benchmark frameworks, evaluation assets, and dataset-quality scorecards; Use rigorous evaluation methods to identify meaningful dataset improvements; Improve benchmark fidelity; Sharpen the company’s understanding of what high-impact agentic data actually looks like

Industry & Context.

Problems you'll solve

independently identify and solve high-impact problems

What They're Looking For.

Must Have

PhD or equivalent Master’s Degree + 4+ years industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or related quantitative fields, understanding of AI model training pipelines, evaluation methodology, and the role of data in shaping model performance, Experience working with large, unstructured, or semi-structured datasets used to train or evaluate ML systems, Experience with reinforcement learning, sequential decision-making, agentic systems, tool-using models, or multi-step model evaluation, Experience designing tasks, benchmarks, environments, simulations, or evaluation frameworks for real-world model behavior, experimental design, evaluation, benchmarking, and data-validation skills

Nice to Have

Experience developing evaluation frameworks or performance metrics for datasets, agentic systems, or training data, Experience translating real-world workflows into structured tasks or environments for model evaluation, Experience with RLHF, RLAIF, imitation learning, reward modeling, online or offline RL, or related methods, Experience with Harbor or other agent evaluation frameworks, Publications or open-source contributions in reinforcement learning, agents, evaluation, or data-centric AI, Experience collaborating cross-functionally with product, infrastructure, or partnership teams, Experience with synthetic data generation, trajectory generation, or simulation-based environments

What You'll Do.

Design and build datasets

and evaluation assets

Translate real-world workflows into structured tasks

Develop frameworks that assess diversity

Build quality scorecards and evaluation methods

recovery from failure

Connect model failures back to concrete dataset

Contribute to tools and systems that automate dataset validation

Improve internal infrastructure for reproducible experimentation

How You'll Work.

Team & Collaboration

Collaborate closely with research and engineering teams; Represent DataLab’s perspective in cross-functional discussions; Experience collaborating cross-functionally with product, infrastructure, or partnership teams

Full Job Description

Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech. We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. ABOUT DATALAB DataLab exists because truly useful data is rare — and the frontier of AI development only moves forward when high-quality data makes it possible. We believe data is one of the most underdeveloped layers of the AI stack. Our work focuses on building and evaluating high-value datasets grounded in real-world workflows and economically meaningful tasks. We work across multiple domains to create safe, high-fidelity datasets that preserve the structure and context needed to train advanced AI systems. Our research spans data quality, evaluation design, privacy-preserving transformation, workflow reconstruction, and task-grounded AI training data. At DataLab, applied research is tightly connected to real-world deployment. Researchers work directly with large-scale datasets, production systems, and frontier AI training problems. ROLE OVERVIEW Data is the foundation of AI performance, and we believe model quality starts with data quality. As AI systems become more agentic, a critical challenge is understanding which real-world datasets, tasks, and environments actually lead to better model behavior. We’re seeking a Machine Learning Researcher focused on RL and agentic systems to help define, desig

Free ATS check

Applying for this Machine Learning Researcher - RL and Agentic Systems role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 80 detected · ranked by frequency

Reinforcement Learning ×6

Agentic Systems ×6

Benchmarking ×6

Evaluation ×4

Data Quality ×4

Dataset Design ×3

Environment Design ×3

Statistical Methods ×3

ML-driven methods ×3

Experimental Design ×3

Data Validation ×3

Data Evaluation ×2

evaluation methodology ×2

dataset quality ×2

Machine Learning

AI model training pipelines

dataset evaluation

task design

environment fidelity

model performance

real-world workflows

evaluation assets

quality scorecards

dataset strengths

dataset weaknesses

failure modes

planning

tool use

robustness

recovery from failure

Role Details

Experience 4–10 yrs

Level Senior

Type FULL TIME

Education PhD or equivalent Master’s Degree

Category datalab

AI-Extracted Insights

Domain Areas

ai-training-dataagentic-systemsreal-world-workflowseconomically-meaningful-tasksdata-qualityevaluation-designprivacy-preserving-transformationworkflow-reconstruction

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Protege?

Real rants from real employees. Read before you apply.

Read Company Rants →