OpenAI

AI Research and Deployment

SoftwareEngineer,RLTrainingInfra

$295–445k San Francisco, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Software Engineer, RL Training Infra at OpenAI. Skills: RL Training, ML Infrastructure, Distributed Systems, Debugging. Keep large-scale RL training runs moving. Jump into the most urgent engineering and infrastructure problems”

What You'll Achieve.

Keep our frontier RL training runs fast, reliable, and unblocked; Ensure we make agents genuinely useful for developers, enterprises, researchers, and everyday users; Ship our frontier models

Industry & Context.

AI Research and Deployment
Problems you'll solve

Fixing the highest-impact problem; Solve hard technical problems

What They're Looking For.

Must Have

Generalist engineer with experience in some layer of ML infrastructure, Learn extremely quickly and are comfortable operating across unfamiliar layers, Debugger with high ownership, low ego, and excellent communication, Can land in a messy area with tight timelines, become useful quickly, and gradually raise the quality of the whole system, Energized by fast-moving environments where reliability, speed, and judgment matter, Like building load-bearing systems and processes when that is what the team needs

Nice to Have

Experience supporting large-scale model training, Experience supporting async RL systems, Experience supporting high-throughput ML infrastructure, Experience debugging distributed systems across GPUs, networking, orchestration, or inference stacks, Background in performance optimization, Background in scaling, Background in production-critical infrastructure, Experience working directly with researchers, Experience working with fast-moving model teams

What You'll Do.

Keep large-scale RL training runs moving

Jump into the most urgent engineering and infrastructure problems

Debug issues across training systems

and distributed infrastructure

Solve hard technical problems at the boundary between research and engineering

Improve training reliability

Debug distributed systems

Reduce latency and cost

Make new capabilities robust under real workloads

Improve reliability and efficiency for RL training runs

Help researchers who are developing infra-heavy integrations

Turn recurring operational issues into better tools

Debug failures that cut across model behavior

evaluation infrastructure

Turn failures into hypotheses

and durable improvements

How You'll Work.

Team & Collaboration

Work closely with research, infrastructure, and partner teams during tight model run timelines; Help researchers who are developing infra-heavy integrations

Communication Scope

Excellent communication

Full Job Description

About the Team The Post-Training Frontiers team creates the frontier agents OpenAI ships to the world. We do the reinforcement learning training for the agentic models we ship in Codex, ChatGPT, and the API (from o1 to 5.5). Our role consists of (1) shepherding all integrations that should go into the final RL run and deciding what can make it in, (2) babysitting and scaling the final run, and (3) building the research and infra for horizontal integrations, such as improving function calling, factuality, multi-agent capabilities, memory, calibrated thinking, etc. About the Role This role focuses on keeping our frontier RL training runs fast, reliable, and unblocked. You will work across engineering and infrastructure problems as they emerge, from scaling and orchestration issues to inference bottlenecks, numerical problems, and hardware failures, as well as supporting large horizontal integrations in the big run, like multi-agent capabilities or memory. This is a role for a strong generalist who quickly learns anything needed for the task, has high attention to detail, debugs deeply, and is motivated by fixing the highest-impact problem in front of the team. In this role, you will: - Keep large-scale RL training runs moving by jumping into the most urgent engineering and infrastructure problems. - Debug issues across training systems, inference, orchestration, scaling, and distributed infrastructure. - Solve hard technical problems at the boundary between research and engineering: scaling experiments, improving training reliability, debugging distributed systems, reducing latency and cost, and making new capabilities robust under real workloads. - Improve reliability and efficiency for RL training runs. - Help researchers who are developing infra-heavy integrations, such as multi-agent capabilities or memory. - Turn recurring operational issues into better tools, systems, processes, or abstractions. - Work closely with research, infrastructure, and partner teams dur

Free ATS check

Applying for this Software Engineer, RL Training Infra role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about OpenAI?

Real rants from real employees. Read before you apply.

Read Company Rants →