AfterQuery
Tech / AI / Software
ResearchScientist-FrontierData
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Research Scientist - Frontier Data at AfterQuery. Skills: design datasets, evaluation frameworks, experiment with data collection strategies, diagnose model failure modes, develop metrics. design the datasets and evaluation frameworks that shape how frontier models are trained and measured. experiment with data collection strategies”
What You'll Achieve.
make their models better; design high signal datasets; run rigorous evaluations that go beyond static benchmarks; shape how frontier models are trained and measured; determine whether a model is actually getting better; improve different model capabilities; measuring dataset quality, diversity, and downstream impact on model alignment and capability
Industry & Context.
quantitative instincts; extract actionable insights from messy results
What They're Looking For.
Must Have
undergrad research, master's research
Nice to Have
worked for/interned for any RL environment companies, worked for/interned for any AI safety or benchmarking orgs like METR, Artificial Analysis, etc., familiarity with LLM training pipelines, familiarity with RLHF/RLVR, familiarity with evaluation methodology
What You'll Do.
design the datasets and evaluation frameworks that shape how frontier models are trained and measured
experiment with data collection strategies
diagnose model failure modes
develop the metrics that determine whether a model is actually getting better
design data slides and explore data shapes that expose meaningful model failure modes across domains like finance
and enterprise workflows
Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines
Model annotator behavior and run experiments to improve different model capabilities
Develop quantitative frameworks for measuring dataset quality
and downstream impact on model alignment and capability
Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications
How You'll Work.
Team & Collaboration
Working directly with research teams at top AI labs; Partner with lab research teams
Full Job Description
About AfterQuery AfterQuery builds the training data and evaluation infrastructure that frontier AI labs use to make their models better. We work with the world's leading labs to design high signal datasets and run rigorous evaluations that go beyond static benchmarks. We are a small, early team (post Series A) where individual contributors have a direct impact on how the next generation of models learn and improve. The Role You'll design the datasets and evaluation frameworks that shape how frontier models are trained and measured. Working directly with research teams at top AI labs, you'll experiment with data collection strategies, diagnose model failure modes, and develop the metrics that determine whether a model is actually getting better. This is hands-on, high leverage work: you'll go from hypothesis to live experiment quickly, and your output will directly influence model training runs at scale. What You'll Do - Design data slides and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows - Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines - Model annotator behavior and run experiments to improve different model capabilities - Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability - Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications What We're Looking For - Great candidates are undergrad research or master's research (but haven't done a phd) - Major plus if they've worked for/interned for any RL environment companies in the past or any AI safety or benchmarking orgs like METR, Artificial Analysis, etc.. - Genuine obsession with how data structure, selection, and quality drive model behavior - Ability to design lightweight experiments, move fast, and extract actionable insights from messy results - Comfort wo
Applying for this Research Scientist - Frontier Data role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about AfterQuery?
Real rants from real employees. Read before you apply.