Datadog

Technology

StaffAppliedScientist-Dashboards

$350–550k ~AI est. New York, New York, United States Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Staff Applied Scientist - Dashboards at Datadog. Skills: AI system evaluation, ML system measurement, GenAI initiatives. Own evaluation strategy. Define metrics”

What You'll Achieve.

Guarantee quality of AI system at scale; Catch quality changes before customers; Make assets reusable by teams

Industry & Context.

Technology

Problems you'll solve

Root cause analysis

What They're Looking For.

Must Have

BS/MS/PhD in scientific field, 10+ years engineering/applied science experience, Technical lead experience, Proven track record leading ML/GenAI initiatives, Significant experience evaluating ML systems at scale, Experience experimenting ML systems at scale, Experience measuring ML systems at scale, Product mindset, Comfortable driving initiatives across cross-functional teams, Thrive in ambiguity, Make sound technical calls

Nice to Have

Research through production experience

What You'll Do.

Own evaluation strategy

Build regression harnesses

Drive improvements to retrieval relevance

Drive improvements to tool-selection accuracy

Drive improvements to context efficiency

Provide technical leadership

Conduct design reviews

Participate in working groups

How You'll Work.

Team & Collaboration

Sister teams; Broader organization; Cross-functional teams; Dashboards team; Engineers on the team

Full Job Description

The Dashboards product is Datadog's unified single-pane-of-glass for metrics, logs, and traces—a comprehensive treasure trove of observability data. We are transforming Dashboards into an AI-native control surface and the central hub where every team moves seamlessly from question to insight to action – providing a guided experience that feels like having an expert SRE at your side and ensuring the entry point is never an empty canvas. We're hiring a Staff Applied Scientist to define and guarantee the quality of this AI system at scale. "Good" isn't one number — it spans answer quality, tool-selection accuracy (critical given the growing catalog of data sources and visualizations), retrieval relevance, latency, token cost, and end-to-end agent success. The space is full of open questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when a user’s query can result in the agent making decisions against dozens of visualizations and data sources – both of which are growing month over month? How do you build a measurement system that catches regressions across all widget types and data sources (e.g., enforcing correct grouping, sorting, and time overrides), and is easy to use and extend by dozens of teams? If those are the problems you want to spend your time on, come build this with us. At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them. What You’ll Do: Own the evaluation strategy for Dashboards, as well as sister teams within our organization. Define the metrics — offline and online, quality and cost, single-turn and multi-turn — that the team and the broader organization optimize against. Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, an

Free ATS check

Applying for this Staff Applied Scientist - Dashboards role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 30 detected · ranked by frequency

Answer quality ×3

Tool selection accuracy ×3

Latency ×3

Token cost ×3

Agent success ×3

Offline metrics ×3

Online metrics ×3

Quality metrics ×3

Cost metrics ×3

Single-turn metrics ×3

Multi-turn metrics ×3

Eval datasets ×3

Golden traces ×3

Regression harnesses ×3

AI system evaluation ×2

ML system measurement ×2

GenAI initiatives ×2

GenAI

Metrics

Data sources

Visualizations

Retrieval relevance

Context efficiency

Evaluation strategy

Measurement system

Technical leadership

Design reviews

Working groups

BEHAVIOURAL

LeadershipMentorship

Role Details

Experience 5–10 yrs

Level Senior

Work Mode Hybrid

Education Bachelor's

Category dev-eng

Salary Band 200k+

AI-Extracted Insights

Domain Areas

observability-datametricslogstracesai-native-control-surfacedata-sourcesvisualizationswidget-types

ANONYMOUS · UNFILTERED

What do employees actually say about Datadog?

Real rants from real employees. Read before you apply.

Read Company Rants →