Roblox

Technology

PrincipalSoftwareEngineer,GPUCompute

$345–399k San Mateo, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Principal Software Engineer, GPU Compute at Roblox. Skills: GPU Compute, Distributed systems, AI workloads. Serve as GPU technical leader. Partner across teams”

Industry & Context.

Technology
Problems you'll solve

Root cause analysis; Troubleshooting

What They're Looking For.

Must Have

10+ years experience, GPU expertise, Compute expertise, Proficiency in Go

Nice to Have

Familiarity with Kubernetes, Bare-metal concepts familiarity

What You'll Do.

Serve as GPU technical leader

Own GPU host lifecycle

Manage driver lifecycle

Manage firmware lifecycle

Remediate GPU failures

Architect GPU capacity exposure

Integrate with Kubernetes

Drive GPU reliability

Drive GPU performance

Define detection of unhealthy accelerators

Define diagnosis of unhealthy accelerators

Define automated repair of unhealthy accelerators

Evaluate new GPU platforms

Onboard new AI platforms

Onboard new networking topologies

Onboard multi-node patterns

Establish standards for GPU compute

Establish tooling for GPU compute

Establish APIs for GPU compute

How You'll Work.

Team & Collaboration

Partnering across Kubernetes; Partnering across Machine Bootstrap; Partnering across Networking; Partnering across Cloud

Full Job Description

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there. A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone. As a Principal Software Engineer on the Compute team, you will be the technical anchor for Roblox's GPU and AI accelerator capabilities. This is a battle-tested GPU expert role focused on the machine management layer and above: how GPU hosts are made production-ready, kept healthy, and turned into reliable compute for the workloads that depend on them. You will own the hard problems that show up only at scale, from driver and firmware management to GPU health, reliability, and performance across a rapidly growing fleet of accelerators spanning Roblox data centers and cloud environments. You will set the technical direction for GPU compute and up-level the entire organization's GPU expertise. You will: Serve as the GPU technical leader for the Compute team, partnering across Kubernetes, Machine Bootstrap, Networking, and Cloud to drive GPU strategy end to end. Own the GPU host lifecycle above raw fleet management: driver, firmware, and CUDA stack management, GPU health and telemetry, and remediation of GPU-specific failures (XID errors, ECC, thermal, NVLink and fabric faults). Architect how GPU capacity is exposed to compute platforms, including scheduling, isolation, and integration with Kubernetes for

Free ATS check

Applying for this Principal Software Engineer, GPU Compute role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Roblox?

Real rants from real employees. Read before you apply.

Read Company Rants →