OpenAI
Scaling
TechnicalLeadManager-TrainingRuntime,Data(set)Movement
Neural analysis suggests this role is
optimal for Manager candidates.
“Technical Lead Manager - Training Runtime, Data(set) Movement at OpenAI. Skills: Training runtime, Dataset movement, Distributed systems, Data loading. Design unified dataset read platform. Define dataset APIs”
What You'll Achieve.
Make data access consistent; Make data access reproducible; Make data access maintainable; Ensure reliable experience; Ensure efficient experience
Industry & Context.
Debugging; Troubleshooting
What They're Looking For.
Must Have
5+ years experience, Deeply hands-on Technical Lead Manager, Primary technical owner for dataset reads, Work directly in code, Align researchers, training framework owners, storage teams, and infrastructure partners, Lead through code and technical judgment, Manage engineers without losing hands-on edge, Obsess over developer experience
Nice to Have
Rust or C++ experience useful
What You'll Do.
Design unified dataset read platform
Define storage-format expectations
Define registration/versioning
Define migration paths
Build reliability into read path
Build stateful iteration
Build clear operational contracts
Build terminal visualizers
Build web-based visualizers
Inspect multimodal data
Inspect reinforcement learning data
Write production code
Review production code
Partner with training framework teams
Partner with reinforcement learning teams
Partner with multimodal model teams
Partner with storage teams
Partner with runtime teams
Partner with cluster infrastructure teams
Own fast data movement
Own correct data movement
Own scalable data movement
Own reliable data movement
Own checkpoint loads/saves
Own snapshot transfers
How You'll Work.
Team & Collaboration
Researchers; Training framework owners; Storage teams; Infrastructure partners; Multimodal models; Adjacent infrastructure teams
Full Job Description
ABOUT THE TEAM Training Runtime builds the distributed systems that power OpenAI's largest model training runs - most recently GPT-5.5! The Data Movement area owns the infrastructure that keeps training jobs supplied with the right data at the right time, and keeps model state moving safely and efficiently across large clusters. Our work spans machine learning systems, distributed storage, high-throughput data loading, reliability engineering, and developer experience. Success means researchers can move quickly while training runs remain fast, reproducible, debuggable, and resilient at scale. ABOUT THE ROLE We are looking for a deeply hands-on Technical Lead Manager to own datasets throughout our training infrastructure. This person will set the direction for how training jobs read data: the APIs, storage contracts, versioning model, benchmarks, debugging tools, and reliability guarantees that make data access consistent across current and future training frameworks. You will begin as the primary technical owner for dataset reads, working directly in the code while aligning researchers, training framework owners, storage teams, and infrastructure partners around a durable platform. The problem is deceptively hard at frontier scale: make enormous, heterogeneous datasets easy to consume, correct across distributed workers, observable when something goes wrong, and flexible enough to support pretraining, reinforcement learning, and multimodal training. IN THIS ROLE, YOU WILL - Design and build a unified dataset read platform for multiple current and future training frameworks. - Define dataset APIs, storage-format expectations, registration/versioning, and migration paths that make data access reproducible and maintainable. - Build reliability into the read path, including stateful iteration, caching, fast restart, recovery, and clear operational contracts. - Build terminal and web-based visualizers that let teams inspect text, multimodal, and reinforcement learning da
Applying for this Technical Lead Manager - Training Runtime, Data(set) Movement role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about OpenAI?
Real rants from real employees. Read before you apply.