AfterQuery

Tech / AI / Software

SoftwareEngineer-RLEnvironments

$180–220k san francisco, california, united states FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Entry candidates.

The Brief

“Software Engineer - RL Environments at AfterQuery. Skills: Design datasets, Evaluation rubrics, Data collection strategies, Model failure modes, Metrics development. Design the datasets and evaluation rubrics that directly influence how frontier models learn. Work hands-on with research teams at top AI labs”

Industry & Context.

Tech / AI / Software
Problems you'll solve

Diagnosing model failure modes; Extract actionable insights from messy results

What They're Looking For.

Must Have

1-4 YOE

Nice to Have

Worked for/interned for any RL environment companies in the past, Worked for/interned for any AI safety or benchmarking orgs like METR, Artificial Analysis, etc., Former founders and early engineers at early stage startups

What You'll Do.

Design the datasets and evaluation rubrics that directly influence how frontier models learn

Work hands-on with research teams at top AI labs

Experimenting with data collection strategies

Diagnosing model failure modes

Developing the metrics that determine whether a model is actually improving

Go from hypothesis to live experiment quickly

Output will feed directly into model training runs at scale

Design data slices that expose meaningful failure modes across domains like finance

and enterprise workflows

Build and refine reward signals for RLHF and RLVR pipelines

Develop quantitative frameworks for measuring dataset quality

and downstream impact on alignment and capability

Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications

Design data slides and explore data shapes that expose meaningful model failure modes across domains like finance

and enterprise workflows

Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines

Model annotator behavior and run experiments to improve different model capabilities

Develop quantitative frameworks for measuring dataset quality

and downstream impact on model alignment and capability

Create and manage both real world & synthetic data pipelines

Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications

How You'll Work.

Team & Collaboration

Work hands-on with research teams at top AI labs; Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications

Full Job Description

About AfterQuery AfterQuery builds the training data and evaluation infrastructure that frontier AI labs use to make their models better. We work with the world's leading labs to design high signal datasets and run rigorous evaluations that go beyond static benchmarks. We are a small, early team (post Series A) where individual contributors have a direct impact on how the next generation of models learn and improve. The Role As a SWE (Environments), you will design the datasets and evaluation rubrics that directly influence how frontier models learn. You'll work hands-on with research teams at top AI labs, experimenting with data collection strategies, diagnosing model failure modes, and developing the metrics that determine whether a model is actually improving. You'll go from hypothesis to live experiment quickly, and your output will feed directly into model training runs at scale. Day to day, you will design data slices that expose meaningful failure modes across domains like finance, code, and enterprise workflows. You will build and refine reward signals for RLHF and RLVR pipelines. You will develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on alignment and capability. You will partner with lab research teams to translate their training objectives into concrete data and evaluation specifications. What You'll Do - Design data slides and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows - Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines - Model annotator behavior and run experiments to improve different model capabilities - Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability - Create and manage both real world & synthetic data pipelines - Partner with lab research teams to translate their training objectives into concrete data

Free ATS check

Applying for this Software Engineer - RL Environments role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about AfterQuery?

Real rants from real employees. Read before you apply.

Read Company Rants →