DigitalOcean
StaffAIOrchestrationEngineer
Neural analysis suggests this role is
optimal for Lead candidates.
“Staff AI Orchestration Engineer at DigitalOcean. Skills: AI Orchestration, Kubernetes, Large-scale scheduling, GPU utilization optimization. Lead the design, optimization, and scaling of Kubernetes-based AI infrastructure. Tackle unique challenges of massive-scale AI workloads”
What You'll Achieve.
Simplest scalable cloud; Support next-generation distributed training and disaggregated inference; Maximize GPU utilization; Eliminate GPU waste
Industry & Context.
Tackle unique challenges; Performance optimization
What They're Looking For.
Must Have
Experience with massive-scale AI workloads, Focus on throughput, GPU utilization, and fault tolerance, Support next-generation distributed training and disaggregated inference, Design and optimize hierarchical, high-throughput scheduling architectures for massive Kubernetes clusters (1,000+ nodes, 10,000+ pods), Utilize techniques like optimistic concurrency, multi-scheduler architectures, and batch dispatching, Eliminate GPU waste in multi-tenant environments by implementing fractional GPU allocation, Leverage mechanisms like KAI-Scheduler's Reservation Pods or hard-isolation tools like HAMi, Configure time-based fairshare scheduling to balance over-quota pool access, Optimize placement of AI workloads, Experience with Kubernetes, Experience with AI/ML infrastructure, Experience with distributed systems, Experience with cloud platforms, Experience with GPU computing, Experience with performance optimization, Experience with large-scale systems
Nice to Have
Experience with KAI-Scheduler, Experience with HAMi, Experience with AI orchestration frameworks, Experience with MLOps
What You'll Do.
and scaling of Kubernetes-based AI infrastructure
Tackle unique challenges of massive-scale AI workloads
Support next-generation distributed training and disaggregated inference
Architect large-scale scheduling
Design and optimize hierarchical
high-throughput scheduling architectures for massive Kubernetes clusters
Utilize techniques like optimistic concurrency
multi-scheduler architectures
and batch dispatching
Maximize GPU utilization
Eliminate GPU waste in multi-tenant environments
Implement fractional GPU allocation
Leverage mechanisms like KAI-Scheduler's Reservation Pods or hard-isolation tools like HAMi
Configure time-based fairshare scheduling
Optimize placement of AI workloads
Drive technical strategy for AI infrastructure
Mentor other engineers
Full Job Description
Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here. We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world. We are seeking a Staff AI Orchestration Engineer to lead the design, optimization, and scaling of our Kubernetes-based AI infrastructure. In this role, you will tackle the unique challenges of massive-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation distributed training and disaggregated inference. What You'll Do: Architect Large-Scale Scheduling: Design and optimize hierarchical, high-throughput scheduling architectures for massive Kubernetes clusters (1,000+ nodes, 10,000+ pods), utilizing techniques like optimistic concurrency, multi-scheduler architectures, and batch dispatching. Maximize GPU Utilization: Eliminate GPU waste in multi-tenant environments by implementing fractional GPU allocation, leveraging mechanisms like KAI-Scheduler's Reservation Pods or hard-isolation tools like HAMi, and configuring time-based fairshare scheduling to balance over-quota pool access. Optimize Placement bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program. DigitalOcean is an equal-opportunity employer. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service. A
Applying for this Staff AI Orchestration Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about DigitalOcean?
Real rants from real employees. Read before you apply.