Anthropic

Technology

SoftwareEngineer,SafeguardsEvals

$200–300k ~AI est. New York, New York, United States Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Software Engineer, Safeguards Evals at Anthropic. Skills: Applied ML research, Engineering, Evaluation infrastructure. Build evaluation harness. Define metrics”

What You'll Achieve.

Catch misuse; Surface bad actors; Surface policy violations; Surface emerging threats; Inform enforcement actions; Inform model launch decisions; Measure agent performance; Drive hill-climbing; Identify measurement gaps; Evolve evals; Remain unsaturated; High-signal evals

Industry & Context.

Technology

Problems you'll solve

Translate ambiguous problems

What They're Looking For.

Must Have

Proficiency in Python, Experience building data pipelines, Experience working with LLMs, Ability to move between research prototyping and production code, Ability to translate ambiguous problems into experiments, Bachelor's degree or equivalent experience

Nice to Have

6+ years of industry software engineering experience, Expertise in agent evaluation frameworks, Extensive experience in trust and safety, Experience in red teaming, Experience with synthetic data generation, Experience with distributed systems, Experience with prompt engineering

What You'll Do.

Build evaluation harness

Define grading approaches

Construct eval datasets

Measure agent performance

Identify measurement gaps

Productionize research

Enable policy experts

Iterate on evaluations

Construct RL environments

How You'll Work.

Team & Collaboration

Single cohesive team; Research discussions

Full Job Description

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role How do we know our safety systems actually catch misuse? Anthropic increasingly uses AI to investigate potential misuse of Claude — analyzing real-world traffic to surface bad actors, policy violations, and emerging threats. Its findings inform enforcement actions and model launch decisions, which means we need rigorous, trustworthy answers to questions like: Does the monitoring agent catch what it should? Where does it fail? Does it stay reliable as adversaries adapt, as models improve, and as the agent itself changes? This role builds the evaluation infrastructure that answers those questions. You'll sit at the intersection of applied ML research and engineering — designing experiments to measure how well an investigative agent performs across harm areas, building datasets that represent real abuse rather than synthetic benchmarks, and shipping those methods into pipelines that gate every change to the system. Your work directly determines how much trust Anthropic can place in its automated abuse detection, and where we invest to make it better. Key responsibilities Build and own the evaluation harness for an agentic investigation system — defining metrics, test cases and grading approaches for a complex long horizon agent Construct high-quality eval datasets representing real-world misuse across harm areas (e.g., cyber attacks, bio weapons, influence operations), drawing from real traffic patterns and synthetic generation Measure agent performance end-to-end (detection precision/recall, investigation quality, robustness) and drive hill-climbing on the hardest harm areas Analyze coverage to identify measurement gaps,

Free ATS check

Applying for this Software Engineer, Safeguards Evals role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Anthropic?

Real rants from real employees. Read before you apply.

Read Company Rants →