NVIDIA
AI Infrastructure
Manager,DistinguishedEngineer-DGXSystemsSoftware
Neural analysis suggests this role is
optimal for Lead candidates.
“Manager, Distinguished Engineer - DGX Systems Software at NVIDIA. Skills: End-to-End Stack Readiness, Platform Firmware Development, Validation Strategy, Platform Bring-Up & Architecture, Customer Deployment & Enablement, Product Delivery Lifecycle, Cross-Org Alignment, Quality & Vendor Management, Team Leadership. End-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment. Ensure each DGX product ships as a production-ready system where firmware, O”
What You'll Achieve.
Ensure every DGX platform is ready for the full NVIDIA software stack—firmware, DGX OS, GPU drivers, CUDA toolkit, DCGM, DOCA/OFED, and management tools—as a validated, production-quality product; Ensure each DGX product ships as a production-ready system; zero ship-stopper discipline
Industry & Context.
RCCA processes for field issues
What They're Looking For.
Must Have
BS or MS in Computer Science, Electrical Engineering, or related field or equivalent experience, 12+ overall years in systems firmware/software engineering, 5+ years in engineering leadership, Deep expertise in server system stack including SBIOS, BMC, OS, applications and system-level integration of complex multi-component products, Proven track record delivering multi-generation server or data center platforms from architecture through customer deployment, Experience managing engineering organizations across multiple geographies in a matrix environment, understanding of server hardware: CPU, GPU, interconnect, memory, PCIe, power delivery, Experience owning end-to-end product quality—from firmware validation through full-stack system testing to field deployment
Nice to Have
Experience with NVIDIA DGX, or GPU-accelerated server platforms, Track record driving server bring-up for new silicon and system architecture redesigns, Familiarity with DMTF Redfish, OCP standards, and server manageability ecosystems, Experience with AI/DL workload validation and performance optimization at the platform level, Demonstrated ability to operate at VP/SVP level, influencing cross-BU strategic decisions
What You'll Do.
End-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment
Ensure each DGX product ships as a production-ready system where firmware
and AI applications work together seamlessly
driving architecture and roadmap for next-generation platforms
Own the GA SW/FW release process delivering firmware bundles
and release notes to OEM/OSV partners
Ensure platforms support AI agents like NemoClaw
and workloads customers expect out of the box
Lead development of the manageability firmware stack (BMC
BIOS) for all DGX platforms
Ensure firmware from partner teams (GPU
networking) integrates correctly at system level
Manage 3rd-party vendors and drive platform requirements (NVPOR) across all firmware areas
Define validation strategy proving each DGX platform is production-ready
Establish quality gates and zero ship-stopper discipline
Drive platform bring-up for each new DGX system—coordinating first boot across new silicon (CPU
Own architectural strategy for next-generation platforms including firmware update mechanisms
system security posture
and AI application readiness
Ensure firmware release flows meet CSP and enterprise deployment requirements
Represent DGX platform readiness in executive reviews and strategic planning with VP/SVP leadership
Engage with industry standards bodies (DMTF Redfish
Own the complete DGX delivery lifecycle—system architecture
full-stack validation
and customer deployment—for every DGX product
Serve as single point of accountability for DGX platform readiness across NVIDIA—aligning GPU
and AI software teams to deliver on schedule
Own RCCA processes for field issues
Manage external vendor partnerships (AMI for SBIOS
BMC contributors) with clear quality gates and program tracking
Build and lead a world-class engineering organization
Mentor and develop leaders
Foster a culture of technical excellence
and customer obsession
How You'll Work.
Team & Collaboration
aligning GPU, CPU, networking, security, OS, and AI software teams to deliver on schedule; coordinating first boot across new silicon (CPU, GPU), board design, and firmware teams
Communication Scope
Represent DGX platform readiness in executive reviews and strategic planning with VP/SVP leadership
Process & Methodology
Product Delivery Lifecycle, program tracking
Full Job Description
NVIDIA DGX systems are the foundation of the world’s most advanced AI infrastructure—purpose-built servers, workstations, and personal AI computers that bring together GPUs, CPUs, NVLink, NVIDIA Networking, and a fully optimized AI software stack. We are seeking an engineering leader responsible for end-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment. You will ensure each DGX product ships as a production-ready system where firmware, OS, drivers, CUDA, networking, and AI applications work together seamlessly, while driving architecture and roadmap for next-generation platforms. **What you’ll be doing:** * End-to-End Stack Readiness: Ensure every DGX platform is ready for the full NVIDIA software stack—firmware, DGX OS, GPU drivers, CUDA toolkit, DCGM, DOCA/OFED, and management tools—as a validated, production-quality product. Own the GA SW/FW release process delivering firmware bundles, BaseOS ISOs, and release notes to OEM/OSV partners. Ensure platforms support AI agents like NemoClaw, Hermes agents, NIM microservices, and workloads customers expect out of the box. * Platform Firmware Development: Lead development of the manageability firmware stack (BMC, BIOS) for all DGX platforms. Ensure firmware from partner teams (GPU, CPU, networking) integrates correctly at system level. Manage 3rd-party vendors and drive platform requirements (NVPOR) across all firmware areas. * Validation Strategy: Define validation strategy proving each DGX platform is production-ready: end-to-end system validation including firmware regression, NVQual certification, DL workload performance, OS/CUDA stack testing, multi-user scenarios, power/thermal validation, and field upgrade reliability. Establish quality gates and zero ship-stopper discipline. * Platform Bring-Up & Architecture: Drive platform bring-up for each new DGX system—coordinating first boot across new silicon (CPU, GPU), board design, and firmware teams. Own architectural
Applying for this Manager, Distinguished Engineer - DGX Systems Software role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about NVIDIA?
Real rants from real employees. Read before you apply.