DigitalOcean

StaffAIOrchestrationEngineer

Seattle, Washington, United States
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Staff AI Orchestration Engineer at DigitalOcean. Skills: AI Orchestration, Kubernetes, Large-scale scheduling, GPU utilization optimization. Lead the design, optimization, and scaling of Kubernetes-based AI infrastructure. Tackle unique challenges of massive-scale AI workloads”

What You'll Achieve.

Simplest scalable cloud; Support next-generation distributed training and disaggregated inference; Maximize GPU utilization; Eliminate GPU waste

Industry & Context.

Problems you'll solve

Tackle unique challenges; Performance optimization

What They're Looking For.

Must Have

Experience with massive-scale AI workloads, Focus on throughput, GPU utilization, and fault tolerance, Support next-generation distributed training and disaggregated inference, Design and optimize hierarchical, high-throughput scheduling architectures for massive Kubernetes clusters (1,000+ nodes, 10,000+ pods), Utilize techniques like optimistic concurrency, multi-scheduler architectures, and batch dispatching, Eliminate GPU waste in multi-tenant environments by implementing fractional GPU allocation, Leverage mechanisms like KAI-Scheduler's Reservation Pods or hard-isolation tools like HAMi, Configure time-based fairshare scheduling to balance over-quota pool access, Optimize placement of AI workloads, Experience with Kubernetes, Experience with AI/ML infrastructure, Experience with distributed systems, Experience with cloud platforms, Experience with GPU computing, Experience with performance optimization, Experience with large-scale systems

Nice to Have

Experience with KAI-Scheduler, Experience with HAMi, Experience with AI orchestration frameworks, Experience with MLOps

What You'll Do.

and scaling of Kubernetes-based AI infrastructure

Tackle unique challenges of massive-scale AI workloads

Support next-generation distributed training and disaggregated inference

Architect large-scale scheduling

Design and optimize hierarchical

high-throughput scheduling architectures for massive Kubernetes clusters

Utilize techniques like optimistic concurrency

multi-scheduler architectures

and batch dispatching

Maximize GPU utilization

Eliminate GPU waste in multi-tenant environments

Implement fractional GPU allocation

Leverage mechanisms like KAI-Scheduler's Reservation Pods or hard-isolation tools like HAMi

Configure time-based fairshare scheduling

Optimize placement of AI workloads

Drive technical strategy for AI infrastructure

Mentor other engineers

Full Job Description

Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here. We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world. We are seeking a Staff AI Orchestration Engineer to lead the design, optimization, and scaling of our Kubernetes-based AI infrastructure. In this role, you will tackle the unique challenges of massive-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation distributed training and disaggregated inference. What You'll Do: Architect Large-Scale Scheduling: Design and optimize hierarchical, high-throughput scheduling architectures for massive Kubernetes clusters (1,000+ nodes, 10,000+ pods), utilizing techniques like optimistic concurrency, multi-scheduler architectures, and batch dispatching. Maximize GPU Utilization: Eliminate GPU waste in multi-tenant environments by implementing fractional GPU allocation, leveraging mechanisms like KAI-Scheduler's Reservation Pods or hard-isolation tools like HAMi, and configuring time-based fairshare scheduling to balance over-quota pool access. Optimize Placement bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program. DigitalOcean is an equal-opportunity employer. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service. A

Free ATS check

Applying for this Staff AI Orchestration Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about DigitalOcean?

Real rants from real employees. Read before you apply.

Read Company Rants →