Fundamental
AI
MLResearcher-Evaluations
Neural analysis suggests this role is
optimal for Mid+ candidates.
“ML Researcher - Evaluations at Fundamental. Skills: model evaluation, Python programming, building automated testing pipelines, translating real-world problems into quantifiable metrics. Design and implement rigorous evaluation frameworks. Translate real-world requirements gathered internally into measurable metrics that accurately reflect downstream use cases”
Industry & Context.
translate ambiguous, real-world data challenges into concrete, defensible metrics; decode the final results; empirically measuring exactly why a model fails and where it excels
What They're Looking For.
Must Have
Proven experience in Machine Learning, Data Science, or AI Engineering, with a focus on model evaluation, testing, or benchmarking, programming skills in Python and relevant libraries such as pandas, A solid understanding of traditional ML metrics alongside emerging ways to evaluate foundation model outputs, Experience building and maintaining automated testing pipelines or evaluation harnesses, Excellent internal communication skills, Experience with translating real-world problems into quantifiable metrics
Nice to Have
Experience with tabular data or time series forecasting
What You'll Do.
Design and implement rigorous evaluation frameworks
Translate real-world requirements gathered internally into measurable metrics that accurately reflect downstream use cases
and maintain the internal Python pipelines and datasets used to stress-test our models on a day-to-day basis
Scout the industry for new
relevant external benchmarks for tabular data
Evaluate our models against these public benchmarks and maintain those pipelines
Monitor external foundation models and classical ML baselines
Integrate and update external foundation models and classical ML baselines within our system
Create and maintain a comprehensive leaderboard and characterization of our models
Report back to the research team exactly where our models are excelling and where they are falling short
How You'll Work.
Team & Collaboration
Working alongside our core researchers; taking signals from our internal deployment teams; reporting back to the research team
Communication Scope
Excellent internal communication skills; comfortable telling the research team hard truths about model regressions; adept at translating field requirements into technical metrics
Full Job Description
ABOUT FUNDAMENTAL Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict. At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI. We are looking for a Machine Learning Researcher - Evaluations to establish the ground truth for what our models can actually do. In this role, you will take ambiguous, real-world data challenges and translate them into concrete, defensible metrics that our researchers and leadership can trust. Evaluation is not an afterthought here; it is the engine that drives our research roadmap. Working alongside our core researchers, you will be embedded in the entire lifecycle of model development. This means taking signals from our internal deployment teams to define what matters, tracking performance across live training runs, and decoding the final results. If you are obsessed with empirically measuring exactly why a model fails and where it excels, this role is for you. KEY RESPONSIBILITIES - Develop Signal-Driven Evals: Design and implement rigorous evaluation frameworks. You will translate real-world requirements gathered internally into measurable metrics that accurately reflect downstream use cases. - Own the Evaluation Infrastructure: Build, scale, and maintain the internal Python pipelines and datasets used to stress-test our models on a day-to-day basis. - Explore External Benchmarks: Scout the
Applying for this ML Researcher - Evaluations role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Fundamental?
Real rants from real employees. Read before you apply.