Anthropic
Technology
SoftwareEngineer,RL
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Software Engineer, RL at Anthropic. Skills: Reinforcement learning, Data pipelines, AI systems. Own stack parts end-to-end. Build data collection pipelines”
What You'll Achieve.
Make Claude genuinely great; Point capabilities at important things; Advance beneficial AI capabilities; Make training data trustworthy at scale; Catch reward hacking; Ensure environment quality; Make collecting human data fast and painless; Hold up at training scale; Ship fixes for users; Roll systems out to new users; Make QA checks hold up; Review RL task under five minutes; Cut task idea to QA-passed time; Ship fixes that help users; Harden sandboxed environment; Onboard new data vendor; Fix rough edges vendors hit
Industry & Context.
Own problems end-to-end
What They're Looking For.
Must Have
Software engineering skills, Proficiency in at least one modern programming language, Experience designing, building, and running backend systems or infrastructure, Effective use of AI tools, Willingness to own problems end-to-end, Proactive, open communication, Comfort iterating quickly in ambiguous, fast-changing situations, Care about societal impacts of work, Bachelor's degree or equivalent experience
Nice to Have
Experience building LLM-powered systems, Experience with reinforcement learning on LLMs, Time as a forward deployed engineer, founder, or early startup engineer, Experience shipping user-facing products or internal platforms, Experience building data pipelines or integrations, Experience building connectors or integrations with third-party tools and APIs, Experience with containers, Kubernetes, or simulation infrastructure, Experience handling sensitive data or working under tight security controls, Experience working with external data vendors, Basic familiarity with AI safety or security research
What You'll Do.
Own stack parts end-to-end
Build data collection pipelines
Develop QA frameworks
Build interfaces for human data collection
Harden execution environments
Embed with teams and domain experts
Design pipelines and evals with users
Support users directly
Work with partners to roll out systems
Manage technical relationships with vendors
How You'll Work.
Team & Collaboration
Cross-functional teams; Research teams; Operations partners; Security partners; Compliance partners; External data vendors
Communication Scope
Open communication
Process & Methodology
Run workstream
Full Job Description
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role Anthropic's RL Data team builds the systems that produce high-quality reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale. Our goal is to make Claude genuinely great at complex, real-world work — and to point those capabilities at the things that matter most, including AI safety research and beneficial deployments of AI. (To be upfront: this is dual-use work — it advances general capabilities too, though we aim to differentially advance the beneficial ones.) This is a foundational role on a new team: you'll help shape our technical direction and what we build first. The work is hands-on and varied. Some weeks you'll be deep in pipeline or infrastructure engineering; others you'll be tuning prompts until the output is good, or sitting with a research team that depends on your systems and shipping the fixes they need. We're looking for strong engineers who will also do whatever else it takes to make their systems succeed — reading transcripts, supporting users, and wrangling vendors. Key responsibilities Own significant parts of our stack end-to-end, from technical architecture through the unglamorous operational work that makes it succeed Build data collection pipelines, read the transcripts they produce, and iterate on prompts, evals, and graders until the output is good Develop and improve QA frameworks to catch reward hacking and ensure environment quality Build interfaces that make collecting human data fast and painless for the people providing it Harden execution
Applying for this Software Engineer, RL role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Anthropic?
Real rants from real employees. Read before you apply.