Anthropic

Technology

SoftwareEngineer,RL

$175–250k ~AI est. San Francisco, California, United States
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Software Engineer, RL at Anthropic. Skills: Reinforcement learning, Data pipelines, AI systems. Own stack parts end-to-end. Build data collection pipelines”

What You'll Achieve.

Make Claude genuinely great; Point capabilities at important things; Advance beneficial AI capabilities; Make training data trustworthy at scale; Catch reward hacking; Ensure environment quality; Make collecting human data fast and painless; Hold up at training scale; Ship fixes for users; Roll systems out to new users; Make QA checks hold up; Review RL task under five minutes; Cut task idea to QA-passed time; Ship fixes that help users; Harden sandboxed environment; Onboard new data vendor; Fix rough edges vendors hit

Industry & Context.

Technology
Problems you'll solve

Own problems end-to-end

What They're Looking For.

Must Have

Software engineering skills, Proficiency in at least one modern programming language, Experience designing, building, and running backend systems or infrastructure, Effective use of AI tools, Willingness to own problems end-to-end, Proactive, open communication, Comfort iterating quickly in ambiguous, fast-changing situations, Care about societal impacts of work, Bachelor's degree or equivalent experience

Nice to Have

Experience building LLM-powered systems, Experience with reinforcement learning on LLMs, Time as a forward deployed engineer, founder, or early startup engineer, Experience shipping user-facing products or internal platforms, Experience building data pipelines or integrations, Experience building connectors or integrations with third-party tools and APIs, Experience with containers, Kubernetes, or simulation infrastructure, Experience handling sensitive data or working under tight security controls, Experience working with external data vendors, Basic familiarity with AI safety or security research

What You'll Do.

Own stack parts end-to-end

Build data collection pipelines

Develop QA frameworks

Build interfaces for human data collection

Harden execution environments

Embed with teams and domain experts

Design pipelines and evals with users

Support users directly

Work with partners to roll out systems

Manage technical relationships with vendors

How You'll Work.

Team & Collaboration

Cross-functional teams; Research teams; Operations partners; Security partners; Compliance partners; External data vendors

Communication Scope

Open communication

Process & Methodology

Run workstream

Full Job Description

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role Anthropic's RL Data team builds the systems that produce high-quality reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale. Our goal is to make Claude genuinely great at complex, real-world work — and to point those capabilities at the things that matter most, including AI safety research and beneficial deployments of AI. (To be upfront: this is dual-use work — it advances general capabilities too, though we aim to differentially advance the beneficial ones.) This is a foundational role on a new team: you'll help shape our technical direction and what we build first. The work is hands-on and varied. Some weeks you'll be deep in pipeline or infrastructure engineering; others you'll be tuning prompts until the output is good, or sitting with a research team that depends on your systems and shipping the fixes they need. We're looking for strong engineers who will also do whatever else it takes to make their systems succeed — reading transcripts, supporting users, and wrangling vendors. Key responsibilities Own significant parts of our stack end-to-end, from technical architecture through the unglamorous operational work that makes it succeed Build data collection pipelines, read the transcripts they produce, and iterate on prompts, evals, and graders until the output is good Develop and improve QA frameworks to catch reward hacking and ensure environment quality Build interfaces that make collecting human data fast and painless for the people providing it Harden execution

Free ATS check

Applying for this Software Engineer, RL role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Anthropic?

Real rants from real employees. Read before you apply.

Read Company Rants →