NVIDIA

Technology

SeniorSystemsSoftwareEngineer,AIStackandPerformance-DGXStation

$224–357k Santa Clara, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Systems Software Engineer, AI Stack and Performance - DGX Station at NVIDIA. Skills: AI Stack Performance, GPU Optimization, Systems Software Engineering, Deep Learning Frameworks. Own AI application readiness. Define "ready to ship" criteria”

Industry & Context.

Technology
Problems you'll solve

Root cause analysis; Troubleshooting

What They're Looking For.

Must Have

BS or MS or equivalent experience, 12+ years systems software engineering, Hands-on AI/ML workload optimization, GPU performance analysis experience, Deep learning infrastructure experience, Proficiency with PyTorch, TensorFlow, or JAX, Experience profiling GPU workloads, Experience optimizing GPU workloads, Ability to read GPU traces, Understanding of GPU architecture, Experience with inference optimization, Proficiency in C/C++, CUDA, and Python, Comfortable reading GPU kernels, Comfortable modifying GPU kernels

Nice to Have

Optimizing LLM training on multi-GPU NVIDIA systems, Optimizing LLM inference on multi-GPU NVIDIA systems, Contributions to open-source AI frameworks, Contributions to CUDA libraries, Contributions to inference engines, Multi-GPU communication optimization experience, NCCL tuning experience, NVLink utilization experience, Collective operations experience, Parallel training strategies experience, Collaborating with compiler teams, Collaborating with hardware architecture teams, Experience shipping AI-powered products

What You'll Do.

Own AI application readiness

Define "ready to ship" criteria

Close performance gaps

Profile LLM workloads

Optimize LLM workloads

Characterize performance

Identify performance regression

Implement optimizations

Improve kernel fusion

Improve graph execution

Improve operator scheduling

Improve memory management

Translate platform constraints

Validate multi-user scenarios

Validate concurrent workload scenarios

Ensure reliable performance

Validate NVIDIA AI software stack

Ensure version compatibility

Ensure functional correctness

Ensure performance parity

Build benchmarking infrastructure

Maintain benchmarking infrastructure

Make performance data visible

Make performance data actionable

Understand target use cases

Ensure compelling performance

Support customer deployment readiness

Support field critical issues

How You'll Work.

Team & Collaboration

Cross functional teams; Framework teams; Compiler teams; GPU architecture teams; Product management; OEM/OSV partners

Full Job Description

DGX Station (Galaxy) is NVIDIA’s workstation-class AI computer—built on GB300 Blackwell GPUs with NVLink interconnect, delivering data-center-grade AI compute in a deskside form factor. DGX Station is shipped to OEM and OSV partners as a complete SW/FW GA release including firmware bundles, DGX BaseOS, GPU drivers, CUDA toolkit, DCGM, and DOCA/OFED. For DGX Station to deliver on its promise, AI applications like NemoClaw, LLM inference via NIM, Hermes agents, and deep learning frameworks must run production-ready out of the box—optimized for the multi-GPU, high-bandwidth architecture of this platform. We are looking for a deeply technical systems software engineer who will own AI stack readiness on DGX Station. You will profile workloads, identify bottlenecks across GPU compute, NVLink, memory, and host interconnects, drive optimizations across the full stack—from GPU kernels through frameworks to applications—and work hands-on with framework, compiler, and GPU architecture teams to ensure DGX Station delivers best-in-class performance for real AI workloads in multi-user and multi-GPU configurations. **What you’ll be doing:** * AI Application Readiness: Own production readiness of AI applications on DGX Station—NemoClaw, Hermes agents, NIM microservices, and key customer workloads. Define “ready to ship” criteria, run validation, and close every gap between “it runs” and “it runs well” across single-GPU and multi-GPU configurations. * DL Framework Performance: Work cross functionally with different orgs to profile and optimize LLM and deep learning workloads (PyTorch, TensorFlow, JAX) across training and inference on the GB300 Blackwell multi-GPU architecture. Characterize performance across model sizes, batch sizes, precision modes (FP16, INT8, FP8), and GPU scaling (single-GPU vs. multi-GPU with NVLink) to establish benchmarks and identify regression. * System-Level Optimization: Identify bottlenecks in GPU compute, NVLink bandwidth, host memory, PCIe, and CPU–GPU

Free ATS check

Applying for this Senior Systems Software Engineer, AI Stack and Performance - DGX Station role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →