Datadog
Technology
StaffAppliedScientist-Dashboards
Neural analysis suggests this role is
optimal for Senior candidates.
“Staff Applied Scientist - Dashboards at Datadog. Skills: AI system evaluation, ML system measurement, GenAI initiatives. Own evaluation strategy. Define metrics”
What You'll Achieve.
Guarantee quality of AI system at scale; Catch quality changes before customers; Make assets reusable by teams
Industry & Context.
Root cause analysis
What They're Looking For.
Must Have
BS/MS/PhD in scientific field, 10+ years engineering/applied science experience, Technical lead experience, Proven track record leading ML/GenAI initiatives, Significant experience evaluating ML systems at scale, Experience experimenting ML systems at scale, Experience measuring ML systems at scale, Product mindset, Comfortable driving initiatives across cross-functional teams, Thrive in ambiguity, Make sound technical calls
Nice to Have
Research through production experience
What You'll Do.
Own evaluation strategy
Build regression harnesses
Drive improvements to retrieval relevance
Drive improvements to tool-selection accuracy
Drive improvements to context efficiency
Provide technical leadership
Conduct design reviews
Participate in working groups
How You'll Work.
Team & Collaboration
Sister teams; Broader organization; Cross-functional teams; Dashboards team; Engineers on the team
Full Job Description
The Dashboards product is Datadog's unified single-pane-of-glass for metrics, logs, and traces—a comprehensive treasure trove of observability data. We are transforming Dashboards into an AI-native control surface and the central hub where every team moves seamlessly from question to insight to action – providing a guided experience that feels like having an expert SRE at your side and ensuring the entry point is never an empty canvas. We're hiring a Staff Applied Scientist to define and guarantee the quality of this AI system at scale. "Good" isn't one number — it spans answer quality, tool-selection accuracy (critical given the growing catalog of data sources and visualizations), retrieval relevance, latency, token cost, and end-to-end agent success. The space is full of open questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when a user’s query can result in the agent making decisions against dozens of visualizations and data sources – both of which are growing month over month? How do you build a measurement system that catches regressions across all widget types and data sources (e.g., enforcing correct grouping, sorting, and time overrides), and is easy to use and extend by dozens of teams? If those are the problems you want to spend your time on, come build this with us. At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them. What You’ll Do: Own the evaluation strategy for Dashboards, as well as sister teams within our organization. Define the metrics — offline and online, quality and cost, single-turn and multi-turn — that the team and the broader organization optimize against. Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, an
Applying for this Staff Applied Scientist - Dashboards role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Datadog?
Real rants from real employees. Read before you apply.