NVIDIA

Technology

SeniorSystemsSoftwareEngineer,AIStackandPerformance-DGXStation

$224–357k Santa Clara, California, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Systems Software Engineer, AI Stack and Performance - DGX Station at NVIDIA. Skills: AI Stack Performance, Systems Software Engineering, GPU Optimization, Deep Learning Frameworks. Own AI application readiness. Define "ready to ship" criteria”

What You'll Achieve.

Deliver best-in-class performance; Increase throughput on DGX Station; Make performance data actionable

Industry & Context.

Technology

Problems you'll solve

Root cause analysis; Troubleshooting

What They're Looking For.

Must Have

BS or MS or equivalent experience, 12+ years systems software engineering, Hands-on AI/ML workload optimization, GPU performance analysis experience, Deep learning infrastructure experience, Proficiency with PyTorch, TensorFlow, or JAX, Experience profiling GPU workloads, Experience optimizing GPU workloads, Ability to read GPU traces, Understanding of GPU architecture, Experience with inference optimization, Proficiency in C/C++, CUDA, and Python, Comfortable reading GPU kernels, Comfortable modifying GPU kernels

Nice to Have

Optimizing LLM training, Optimizing LLM inference, Multi-GPU NVIDIA systems experience, Contributions to open-source AI frameworks, Contributions to CUDA libraries, Contributions to inference engines, Multi-GPU communication optimization, NCCL tuning experience, NVLink utilization experience, Collective operations experience, Parallel training strategies experience, Collaborating with compiler teams, Collaborating with hardware architecture teams, Experience shipping AI-powered products

What You'll Do.

Own AI application readiness

Define "ready to ship" criteria

Close performance gaps

Profile LLM workloads

Optimize LLM workloads

Characterize performance

Identify performance regressions

Implement optimizations

Improve kernel fusion

Improve graph execution

Improve operator scheduling

Improve memory management

Translate platform constraints

Validate multi-user scenarios

Validate concurrent workload scenarios

Ensure reliable performance

Validate NVIDIA AI software stack

Ensure version compatibility

Ensure functional correctness

Ensure performance parity

Build benchmarking infrastructure

Maintain benchmarking infrastructure

Make performance data visible

Support customer deployment readiness

Support field critical issues

How You'll Work.

Team & Collaboration

Cross functional teams; Framework teams; Compiler teams; GPU architecture teams; Product management; OEM/OSV partners

Communication Scope

Performance data visibility

Full Job Description

DGX Station (Galaxy) is NVIDIA’s workstation-class AI computer—built on GB300 Blackwell GPUs with NVLink interconnect, delivering data-center-grade AI compute in a deskside form factor. DGX Station is shipped to OEM and OSV partners as a complete SW/FW GA release including firmware bundles, DGX BaseOS, GPU drivers, CUDA toolkit, DCGM, and DOCA/OFED. For DGX Station to deliver on its promise, AI applications like NemoClaw, LLM inference via NIM, Hermes agents, and deep learning frameworks must run production-ready out of the box—optimized for the multi-GPU, high-bandwidth architecture of this platform. We are looking for a deeply technical systems software engineer who will own AI stack readiness on DGX Station. You will profile workloads, identify bottlenecks across GPU compute, NVLink, memory, and host interconnects, drive optimizations across the full stack—from GPU kernels through frameworks to applications—and work hands-on with framework, compiler, and GPU architecture teams to ensure DGX Station delivers best-in-class performance for real AI workloads in multi-user and multi-GPU configurations. **What you’ll be doing:** * AI Application Readiness: Own production readiness of AI applications on DGX Station—NemoClaw, Hermes agents, NIM microservices, and key customer workloads. Define “ready to ship” criteria, run validation, and close every gap between “it runs” and “it runs well” across single-GPU and multi-GPU configurations. * DL Framework Performance: Work cross functionally with different orgs to profile and optimize LLM and deep learning workloads (PyTorch, TensorFlow, JAX) across training and inference on the GB300 Blackwell multi-GPU architecture. Characterize performance across model sizes, batch sizes, precision modes (FP16, INT8, FP8), and GPU scaling (single-GPU vs. multi-GPU with NVLink) to establish benchmarks and identify regression. * System-Level Optimization: Identify bottlenecks in GPU compute, NVLink bandwidth, host memory, PCIe, and CPU–GPU

Free ATS check

Applying for this Senior Systems Software Engineer, AI Stack and Performance - DGX Station role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 39 detected · ranked by frequency

Systems Software Engineering ×3

GPU kernel optimization ×3

Memory placement ×3

Quantization ×3

Model compilation ×3

Batching strategies ×3

Serving frameworks ×3

Performance benchmarking ×3

Regression tracking ×3

AI Stack Performance ×2

GPU Optimization ×2

Deep Learning Frameworks ×2

CUDA

cuDNN

TensorRT

NCCL

Triton Inference Server

DCGM

DOCA

OFED

PyTorch

TensorFlow

JAX

NVLink

PCIe

Python

AI workload optimization

GPU performance analysis

Deep learning infrastructure

Inference optimization

Kernel tuning

Data pipeline efficiency

Role Details

Seniority senior

Experience 5–10 yrs

Level Senior

Work Mode Onsite

Type FULL TIME

Salary Band 200k+

AI-Extracted Insights

Domain Areas

ai-computedeep-learningllm-inferencemulti-gpu-architecturehigh-bandwidth-architecturegpu-computenvlink-bandwidthhost-memory

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →