NVIDIA

Technology

SeniorDeepLearningCommunicationArchitect

$184–357k Santa Clara, California, United States FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Deep Learning Communication Architect at NVIDIA. Skills: Deep Learning Communication Architect, DNNs, distributed deep learning, communication protocols, high-speed interconnects, communication libraries, PyTorch, TensorRT-LLM, vLLM, SGLang, C++, Python, CUDA, InfiniBand. Scale the DNN models and training/inference frameworks to systems with hundreds of thousands of nodes. Optimizing communication performance”

What You'll Achieve.

enhance the performance and scalability of deep learning systems; validate and deploy new communication strategies

Industry & Context.

Technology

Problems you'll solve

Identify and eliminate bottlenecks; Develop and implement communication algorithms and protocols tailored for deep learning workloads, minimizing communication overhead and latency

What They're Looking For.

Must Have

Ph. D. , Masters, or BS in Computer Science (CS), Electrical Engineering (EE), Computer Science and Electrical Engineering (CSEE), or a closely related field or equivalent experience, 6+ years of experience in Building DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training and inference workloads, Experience in evaluating, analyzing, and optimizing LLM training and inference performance of state-of-the-art models on cutting-edge hardware, Deep understanding of parallelism techniques, including Data Parallelism, Pipeline Parallelism, Tensor Parallelism, Expert Parallelism, and FSDP, Understanding of the emerging serving architectures like Disaggregated Serving and inference servers like Dynamo and Triton, Proficiency in developing code for one or more deep neural network (DNN) training and Inference frameworks, such as PyTorch, TensorRT-LLM, vLLM, SGLang, programming skills in C++ and Python, Familiarity with GPU computing, including CUDA and OpenCL, familiarity with InfiniBand and RoCE networks

Nice to Have

Prior contributions to one or more DNN training and Inference frameworks as part of your previous work experience, Deep understanding and contributions to the scaling of LLMs on large-scale systems

What You'll Do.

Scale the DNN models and training/inference frameworks to systems with hundreds of thousands of nodes, Optimizing communication performance, Identify and eliminate bottlenecks in data transfer and synchronization during distributed deep learning training and inference, Designing efficient communication protocols, Develop and implement communication algorithms and protocols tailored for deep learning workloads, minimizing communication overhead and latency, Hardware and software co-craft, Collaborate with hardware and software teams to craft systems that effectively apply high-speed interconnects (e.

, NVLink, InfiniBand, SPC-X) and communication libraries (e.

, MPI, NCCL, UCX, UCC, NVSHMEM), Exploring innovative communication technologies, Research and evaluate new communication technologies and techniques to enhance the performance and scalability of deep learning systems, Developing and implementing solutions, Build proofs-of-concept, conduct experiments, and perform quantitative modeling to validate and deploy new communication strategies.

How You'll Work.

Team & Collaboration

Collaborate with hardware and software teams

Full Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. **What You 'll Be Doing:** * The software architecture group at NVIDIA has openings for a Deep Learning Communication Architect. We scale the DNN models and training/inference frameworks to systems with hundreds of thousands of nodes. * Optimizing communication performance: Identify and eliminate bottlenecks in data transfer and synchronization during distributed deep learning training and inference. * Designing efficient communication protocols: Develop and implement communication algorithms and protocols tailored for deep learning workloads, minimizing communication overhead and latency. * Hardware and software co-craft: Collaborate with hardware and software teams to craft systems that effectively apply high-speed interconnects (e.g., NVLink, InfiniBand, SPC-X) and communication libraries (e.g., MPI, NCCL, UCX, UCC, NVSHMEM). * Exploring innovative communication technologies: Research and evaluate new communication technologies and techniques to enhance the performance and scalability of deep learning systems. * Developing and implementing solutions: Build proofs-of-concept, conduct experiments, and perform quantitative modeling to validate and deploy new communication strategies. **What We Need to See:** * A Ph.D., Masters, or BS in Computer Science (CS), Electrical Engineering (

Free ATS check

Applying for this Senior Deep Learning Communication Architect role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 45 detected · ranked by frequency

CUDA ×7

InfiniBand ×7

OpenCL ×5

RoCE networks ×5

NVLink ×5

SPC-X ×5

MPI ×5

NCCL ×5

UCX ×5

UCC ×5

NVSHMEM ×5

PyTorch ×4

TensorRT-LLM ×4

vLLM ×4

SGLang ×4

GPU computing ×4

DNNs ×3

distributed deep learning ×3

Python ×3

Building DNNs ×3

Scaling of DNNs ×3

Parallelism of DNN frameworks ×3

deep learning training and inference workloads ×3

LLM training and inference performance ×3

Data Parallelism ×3

Pipeline Parallelism ×3

Tensor Parallelism ×3

Expert Parallelism ×3

FSDP ×3

Disaggregated Serving ×3

Dynamo ×3

Triton ×3

BEHAVIOURAL

creativeautonomouspassionate

Role Details

Seniority senior

Experience 6–10 yrs

Level Senior

Work Mode No

Type FULL TIME

Education Ph. D. , Masters, or BS in Computer Science (CS), Electrical

Salary Band 150k-200k

AI-Extracted Insights

Domain Areas

deep-learningdnnsdistributed-deep-learning-training-and-inferencellm-training-and-inferenceparallelism-techniquesserving-architecturesinference-serversgpu-computing

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →