OpenAI

Scaling

TechnicalLeadManager-TrainingRuntime,Data(set)Movement

$295–445k San Francisco, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Manager candidates.

The Brief

“Technical Lead Manager - Training Runtime, Data(set) Movement at OpenAI. Skills: Training runtime, Data movement, Dataset reads. Design unified dataset read platform. Define dataset APIs”

What You'll Achieve.

Training runs remain fast; Training runs remain reproducible; Training runs remain debuggable; Training runs remain resilient at scale; Make data access consistent; Make enormous datasets easy to consume; Make heterogeneous datasets easy to consume; Make data access correct; Make data access observable; Make data access flexible

Industry & Context.

Scaling
Problems you'll solve

Troubleshooting; Root cause analysis

What They're Looking For.

Must Have

5+ years Python, Rust or C++ experience useful

Nice to Have

Multimodal, video, RL, or pretraining data pipelines experience, Experience with stateful iterators, Experience with checkpoint/restart semantics, Experience with caching, Experience with remote services, Experience with high-throughput storage reads, Experience with large scale dataset infrastructure, Experience with large scale data loading infrastructure, Experience with large scale storage infrastructure, Experience with large scale distributed training infrastructure

What You'll Do.

Design unified dataset read platform

Define storage-format expectations

Define registration/versioning

Define migration paths

Build reliability into read path

Build terminal visualizers

Build web-based visualizers

Inspect multimodal data

Inspect reinforcement learning data

Write production code

Review production code

Align training framework owners

Align infrastructure partners

Own broader data movement systems

Own checkpoint loads/saves

Own snapshot transfers

How You'll Work.

Team & Collaboration

Researchers; Training framework owners; Storage teams; Infrastructure partners; Multimodal models teams; Reinforcement learning teams; Adjacent infrastructure teams

Full Job Description

ABOUT THE TEAM Training Runtime builds the distributed systems that power OpenAI's largest model training runs - most recently GPT-5.5! The Data Movement area owns the infrastructure that keeps training jobs supplied with the right data at the right time, and keeps model state moving safely and efficiently across large clusters. Our work spans machine learning systems, distributed storage, high-throughput data loading, reliability engineering, and developer experience. Success means researchers can move quickly while training runs remain fast, reproducible, debuggable, and resilient at scale. ABOUT THE ROLE We are looking for a deeply hands-on Technical Lead Manager to own datasets throughout our training infrastructure. This person will set the direction for how training jobs read data: the APIs, storage contracts, versioning model, benchmarks, debugging tools, and reliability guarantees that make data access consistent across current and future training frameworks. You will begin as the primary technical owner for dataset reads, working directly in the code while aligning researchers, training framework owners, storage teams, and infrastructure partners around a durable platform. The problem is deceptively hard at frontier scale: make enormous, heterogeneous datasets easy to consume, correct across distributed workers, observable when something goes wrong, and flexible enough to support pretraining, reinforcement learning, and multimodal training. IN THIS ROLE, YOU WILL - Design and build a unified dataset read platform for multiple current and future training frameworks. - Define dataset APIs, storage-format expectations, registration/versioning, and migration paths that make data access reproducible and maintainable. - Build reliability into the read path, including stateful iteration, caching, fast restart, recovery, and clear operational contracts. - Build terminal and web-based visualizers that let teams inspect text, multimodal, and reinforcement learning da

Free ATS check

Applying for this Technical Lead Manager - Training Runtime, Data(set) Movement role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about OpenAI?

Real rants from real employees. Read before you apply.

Read Company Rants →