NVIDIA

AI

SeniorSystemSoftwareEngineer-Dynamo-TritonInferenceServer

$152–288k Santa Clara, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior System Software Engineer - Dynamo-Triton Inference Server at NVIDIA. Skills: Rust, C++, deep learning software, high-scale distributed systems, ML systems, Dynamo-Triton Inference Server. Develop world-class GPU-accelerated AI inference serving software. Contribute to feature development and drive broad customer adoption”

What You'll Achieve.

drive broad customer adoption; establish a unified, high-performance inference platform; ensure feature parity and effectively serve both Large Language Model (LLM) and non-LLM workloads; building robust software designed to be deployed in production server or cloud environments; optimizing and balancing prediction throughput and latency

Industry & Context.

AI
Problems you'll solve

performance analysis

What They're Looking For.

Must Have

MS or PhD in Computer Science or relevant field (or equivalent experience), 5+ years of professional experience working on deep learning software, Excellent Rust & C++ skills, programming & software design skills including debugging, performance analysis, and test design, Experience with high-scale distributed systems and ML systems

Nice to Have

Prior experience with AI frameworks and engines, such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM, Knowledge of GPU memory management, cache management, or high-performance networking, Experience with distributed systems programming, Experience in contributing to a large open source project: use of GitHub, bug tracking, branching and merging code, OSS licensing issues handling patches, etc.

What You'll Do.

Develop world-class GPU-accelerated AI inference serving software

Contribute to feature development and drive broad customer adoption

Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified

high-performance inference platform

Ensure feature parity and effectively serve both Large Language Model (LLM) and non-LLM workloads

Be an active member of the open source deep learning software engineering community

Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments

optimizing and balancing prediction throughput and latency

and developing and adopting the next generation of inference technologies

How You'll Work.

Team & Collaboration

ability to work in a fast-paced, agile team environment; contributing to a large open source project

Communication Scope

communication skills

Full Job Description

We are looking for a Senior System Software Engineer to work on[ Dynamo-Triton Inference Server](https://developer.nvidia.com/dynamo-triton). NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution in AI, enabling breakthroughs in problems from image classification to speech recognition to natural language processing. We are a fast-paced team building a highly-performant AI inference platform to make design and deployment of new AI models easier and accessible to all users. **What you 'll be doing:** * Develop world-class GPU-accelerated AI inference serving software. * Contribute to feature development and drive broad customer adoption. * Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform. This platform will ensure feature parity and effectively serve both Large Language Model (LLM) and non-LLM workloads. * Be an active member of the[ open source deep learning software engineering community](https://github.com/triton-inference-server/server). * Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments, optimizing and balancing prediction throughput and latency, and developing and adopting the next generation of inference technologies. **What we need to see:** * MS or PhD in Computer Science or relevant field (or equivalent experience). * 5+ years of professional experience working on deep learning software. * Excellent Rust & C++ skills, familiarity with Python, and strong programming & software design skills including debugging, performance analysis, and test design. * Experience with high-scale distributed systems and ML systems. * Strong communication skills and ability to work in a fast-paced, agile team environment. **Ways to stand out from the crowd:** * Prior experience with AI framew

Free ATS check

Applying for this Senior System Software Engineer - Dynamo-Triton Inference Server role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →