NVIDIA

AI Infrastructure

Manager,DistinguishedEngineer-DGXSystemsSoftware

$320–489k Santa Clara, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Manager, Distinguished Engineer - DGX Systems Software at NVIDIA. Skills: End-to-End Stack Readiness, Platform Firmware Development, Validation Strategy, Platform Bring-Up & Architecture, Customer Deployment & Enablement, Product Delivery Lifecycle, Cross-Org Alignment, Quality & Vendor Management, Team Leadership. End-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment. Ensure each DGX product ships as a production-ready system where firmware, O”

What You'll Achieve.

Ensure every DGX platform is ready for the full NVIDIA software stack—firmware, DGX OS, GPU drivers, CUDA toolkit, DCGM, DOCA/OFED, and management tools—as a validated, production-quality product; Ensure each DGX product ships as a production-ready system; zero ship-stopper discipline

Industry & Context.

AI Infrastructure
Problems you'll solve

RCCA processes for field issues

What They're Looking For.

Must Have

BS or MS in Computer Science, Electrical Engineering, or related field or equivalent experience, 12+ overall years in systems firmware/software engineering, 5+ years in engineering leadership, Deep expertise in server system stack including SBIOS, BMC, OS, applications and system-level integration of complex multi-component products, Proven track record delivering multi-generation server or data center platforms from architecture through customer deployment, Experience managing engineering organizations across multiple geographies in a matrix environment, understanding of server hardware: CPU, GPU, interconnect, memory, PCIe, power delivery, Experience owning end-to-end product quality—from firmware validation through full-stack system testing to field deployment

Nice to Have

Experience with NVIDIA DGX, or GPU-accelerated server platforms, Track record driving server bring-up for new silicon and system architecture redesigns, Familiarity with DMTF Redfish, OCP standards, and server manageability ecosystems, Experience with AI/DL workload validation and performance optimization at the platform level, Demonstrated ability to operate at VP/SVP level, influencing cross-BU strategic decisions

What You'll Do.

End-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment

Ensure each DGX product ships as a production-ready system where firmware

and AI applications work together seamlessly

driving architecture and roadmap for next-generation platforms

Own the GA SW/FW release process delivering firmware bundles

and release notes to OEM/OSV partners

Ensure platforms support AI agents like NemoClaw

and workloads customers expect out of the box

Lead development of the manageability firmware stack (BMC

BIOS) for all DGX platforms

Ensure firmware from partner teams (GPU

networking) integrates correctly at system level

Manage 3rd-party vendors and drive platform requirements (NVPOR) across all firmware areas

Define validation strategy proving each DGX platform is production-ready

Establish quality gates and zero ship-stopper discipline

Drive platform bring-up for each new DGX system—coordinating first boot across new silicon (CPU

Own architectural strategy for next-generation platforms including firmware update mechanisms

system security posture

and AI application readiness

Ensure firmware release flows meet CSP and enterprise deployment requirements

Represent DGX platform readiness in executive reviews and strategic planning with VP/SVP leadership

Engage with industry standards bodies (DMTF Redfish

Own the complete DGX delivery lifecycle—system architecture

full-stack validation

and customer deployment—for every DGX product

Serve as single point of accountability for DGX platform readiness across NVIDIA—aligning GPU

and AI software teams to deliver on schedule

Own RCCA processes for field issues

Manage external vendor partnerships (AMI for SBIOS

BMC contributors) with clear quality gates and program tracking

Build and lead a world-class engineering organization

Mentor and develop leaders

Foster a culture of technical excellence

and customer obsession

How You'll Work.

Team & Collaboration

aligning GPU, CPU, networking, security, OS, and AI software teams to deliver on schedule; coordinating first boot across new silicon (CPU, GPU), board design, and firmware teams

Communication Scope

Represent DGX platform readiness in executive reviews and strategic planning with VP/SVP leadership

Process & Methodology

Product Delivery Lifecycle, program tracking

Full Job Description

NVIDIA DGX systems are the foundation of the world’s most advanced AI infrastructure—purpose-built servers, workstations, and personal AI computers that bring together GPUs, CPUs, NVLink, NVIDIA Networking, and a fully optimized AI software stack. We are seeking an engineering leader responsible for end-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment. You will ensure each DGX product ships as a production-ready system where firmware, OS, drivers, CUDA, networking, and AI applications work together seamlessly, while driving architecture and roadmap for next-generation platforms. **What you’ll be doing:** * End-to-End Stack Readiness: Ensure every DGX platform is ready for the full NVIDIA software stack—firmware, DGX OS, GPU drivers, CUDA toolkit, DCGM, DOCA/OFED, and management tools—as a validated, production-quality product. Own the GA SW/FW release process delivering firmware bundles, BaseOS ISOs, and release notes to OEM/OSV partners. Ensure platforms support AI agents like NemoClaw, Hermes agents, NIM microservices, and workloads customers expect out of the box. * Platform Firmware Development: Lead development of the manageability firmware stack (BMC, BIOS) for all DGX platforms. Ensure firmware from partner teams (GPU, CPU, networking) integrates correctly at system level. Manage 3rd-party vendors and drive platform requirements (NVPOR) across all firmware areas. * Validation Strategy: Define validation strategy proving each DGX platform is production-ready: end-to-end system validation including firmware regression, NVQual certification, DL workload performance, OS/CUDA stack testing, multi-user scenarios, power/thermal validation, and field upgrade reliability. Establish quality gates and zero ship-stopper discipline. * Platform Bring-Up & Architecture: Drive platform bring-up for each new DGX system—coordinating first boot across new silicon (CPU, GPU), board design, and firmware teams. Own architectural

Free ATS check

Applying for this Manager, Distinguished Engineer - DGX Systems Software role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →