Roblox
Technology
PrincipalSoftwareEngineer,GPUCompute
Neural analysis suggests this role is
optimal for Senior candidates.
“Principal Software Engineer, GPU Compute at Roblox. Skills: GPU Compute, Distributed systems, AI workloads. Serve as GPU technical leader. Partner across teams”
Industry & Context.
Root cause analysis; Troubleshooting
What They're Looking For.
Must Have
10+ years experience, GPU expertise, Compute expertise, Proficiency in Go
Nice to Have
Familiarity with Kubernetes, Bare-metal concepts familiarity
What You'll Do.
Serve as GPU technical leader
Own GPU host lifecycle
Manage driver lifecycle
Manage firmware lifecycle
Remediate GPU failures
Architect GPU capacity exposure
Integrate with Kubernetes
Drive GPU reliability
Drive GPU performance
Define detection of unhealthy accelerators
Define diagnosis of unhealthy accelerators
Define automated repair of unhealthy accelerators
Evaluate new GPU platforms
Onboard new AI platforms
Onboard new networking topologies
Onboard multi-node patterns
Establish standards for GPU compute
Establish tooling for GPU compute
Establish APIs for GPU compute
How You'll Work.
Team & Collaboration
Partnering across Kubernetes; Partnering across Machine Bootstrap; Partnering across Networking; Partnering across Cloud
Full Job Description
Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there. A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone. As a Principal Software Engineer on the Compute team, you will be the technical anchor for Roblox's GPU and AI accelerator capabilities. This is a battle-tested GPU expert role focused on the machine management layer and above: how GPU hosts are made production-ready, kept healthy, and turned into reliable compute for the workloads that depend on them. You will own the hard problems that show up only at scale, from driver and firmware management to GPU health, reliability, and performance across a rapidly growing fleet of accelerators spanning Roblox data centers and cloud environments. You will set the technical direction for GPU compute and up-level the entire organization's GPU expertise. You will: Serve as the GPU technical leader for the Compute team, partnering across Kubernetes, Machine Bootstrap, Networking, and Cloud to drive GPU strategy end to end. Own the GPU host lifecycle above raw fleet management: driver, firmware, and CUDA stack management, GPU health and telemetry, and remediation of GPU-specific failures (XID errors, ECC, thermal, NVLink and fabric faults). Architect how GPU capacity is exposed to compute platforms, including scheduling, isolation, and integration with Kubernetes for
Applying for this Principal Software Engineer, GPU Compute role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Roblox?
Real rants from real employees. Read before you apply.