OpenAI

Scaling

TechnicalLeadManager-TrainingRuntime,Data(set)Movement

$295–445k San Francisco, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Manager candidates.

The Brief

“Technical Lead Manager - Training Runtime, Data(set) Movement at OpenAI. Skills: Training runtime, Dataset movement, Distributed systems, Data loading. Design unified dataset read platform. Define dataset APIs”

What You'll Achieve.

Make data access consistent; Make data access reproducible; Make data access maintainable; Ensure reliable experience; Ensure efficient experience

Industry & Context.

Scaling
Problems you'll solve

Debugging; Troubleshooting

What They're Looking For.

Must Have

5+ years experience, Deeply hands-on Technical Lead Manager, Primary technical owner for dataset reads, Work directly in code, Align researchers, training framework owners, storage teams, and infrastructure partners, Lead through code and technical judgment, Manage engineers without losing hands-on edge, Obsess over developer experience

Nice to Have

Rust or C++ experience useful

What You'll Do.

Design unified dataset read platform

Define storage-format expectations

Define registration/versioning

Define migration paths

Build reliability into read path

Build stateful iteration

Build clear operational contracts

Build terminal visualizers

Build web-based visualizers

Inspect multimodal data

Inspect reinforcement learning data

Write production code

Review production code

Partner with training framework teams

Partner with reinforcement learning teams

Partner with multimodal model teams

Partner with storage teams

Partner with runtime teams

Partner with cluster infrastructure teams

Own fast data movement

Own correct data movement

Own scalable data movement

Own reliable data movement

Own checkpoint loads/saves

Own snapshot transfers

How You'll Work.

Team & Collaboration

Researchers; Training framework owners; Storage teams; Infrastructure partners; Multimodal models; Adjacent infrastructure teams

Full Job Description

ABOUT THE TEAM Training Runtime builds the distributed systems that power OpenAI's largest model training runs - most recently GPT-5.5! The Data Movement area owns the infrastructure that keeps training jobs supplied with the right data at the right time, and keeps model state moving safely and efficiently across large clusters. Our work spans machine learning systems, distributed storage, high-throughput data loading, reliability engineering, and developer experience. Success means researchers can move quickly while training runs remain fast, reproducible, debuggable, and resilient at scale. ABOUT THE ROLE We are looking for a deeply hands-on Technical Lead Manager to own datasets throughout our training infrastructure. This person will set the direction for how training jobs read data: the APIs, storage contracts, versioning model, benchmarks, debugging tools, and reliability guarantees that make data access consistent across current and future training frameworks. You will begin as the primary technical owner for dataset reads, working directly in the code while aligning researchers, training framework owners, storage teams, and infrastructure partners around a durable platform. The problem is deceptively hard at frontier scale: make enormous, heterogeneous datasets easy to consume, correct across distributed workers, observable when something goes wrong, and flexible enough to support pretraining, reinforcement learning, and multimodal training. IN THIS ROLE, YOU WILL - Design and build a unified dataset read platform for multiple current and future training frameworks. - Define dataset APIs, storage-format expectations, registration/versioning, and migration paths that make data access reproducible and maintainable. - Build reliability into the read path, including stateful iteration, caching, fast restart, recovery, and clear operational contracts. - Build terminal and web-based visualizers that let teams inspect text, multimodal, and reinforcement learning da

Free ATS check

Applying for this Technical Lead Manager - Training Runtime, Data(set) Movement role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about OpenAI?

Real rants from real employees. Read before you apply.

Read Company Rants →