NVIDIA
Artificial Intelligence, High Performance Computing and Visualization
SeniorDeepLearningFrameworksCUDASoftwareEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Deep Learning Frameworks CUDA Software Engineer at NVIDIA. Skills: Deep Learning Frameworks, CUDA, Distributed Runtime, AI. Integrate new CUDA features and Runtime abstractions in AI frameworks. Perform deep analysis of AI workloads and frameworks”
What You'll Achieve.
Bring advanced CUDA features and Distributed Runtime technologies into AI stacks; Improve productivity and performance of AI applications; Accelerate enabling AI toolkits for the community; Build speed-of-light multi-GPU multi-node solutions; Facilitate building next-gen DL frameworks; Enhance performance and programmability; Ensure exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products
Industry & Context.
Deep analysis of AI workloads and frameworks to identify requirements and opportunities to innovate; Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads
What They're Looking For.
Must Have
BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience), 8+ years of relevant industry experience or equivalent academic experience after completed degree, Development experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang, Rapid prototyping and development with Python, C++, CUDA or related DSLs, Solid grasp of AI models, parallelisms, and/or compiler technologies (e. g. torch. compile), Experience conducting performance benchmarking on AI clusters, Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems), Understanding of HPC/AI communication concepts, Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals), Adaptability and passion to learn new frameworks and tools, Flexibility to work and communicate effectively across different teams and timezones
Nice to Have
Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e. g. , PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc. ), Hands-on experience with CUDA, specific communication libraries (e. g. , NCCL, MPI, UCX) and distributed machine learning techniques (e. g. , pipeline parallelism, tensor parallelism), Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc), Background in deep learning compilers, both graph-level and codegen (e. g. , Triton, XLA, torch compile), Experience with programming for compute & communication overlap in distributed runtime
What You'll Do.
Integrate new CUDA features and Runtime abstractions in AI frameworks
Perform deep analysis of AI workloads and frameworks
Identify requirements and opportunities to innovate in the lower layers of the stack
Own and drive improvements in the AI Compiler-Runtime interface
Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads
Influence the roadmap of core CUDA
Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning
and maintainable code
How You'll Work.
Team & Collaboration
Collaborate hands-on with teams working on the latest AI models; Collaborate with a very dynamic team across multiple time zones; Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts; Communicate effectively across different teams and timezones
Communication Scope
Communicate effectively across different teams and timezones
Full Job Description
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. We are looking for a motivated Deep Learning engineer to bring advanced CUDA features and Distributed Runtime technologies into AI stacks, including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You will be working with the team that created core CUDA features and runtimes for scaling Deep Learning and HPC applications. Your customers will have diverse multi-GPU demands, ranging from training on scales up to 100K GPUs to inference down at microsecond latency. CUDA features improve both productivity and performance of AI applications. Your work in AI toolkits will accelerate enabling those for the community. This is an outstanding opportunity for someone with an AI background to advance the state of the art in this space. Are you ready to contribute to the development of innovative technologies and help realize NVIDIA's vision? **What you will be doing:** * Integrate new CUDA features and Runtime abstractions in AI frameworks: from PoC to performance analysis to production * Perform deep analysis of AI workloads and frameworks to identify requirements and opportunities to innovate in the lower layers of the stack. Collaborate hands-on with teams working on the latest AI models. * Own and drive improvements in the AI Compiler-Runtime interface to build speed-of-light multi-GPU multi-node solutions. * Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads. * Influence the roadmap of core CUDA to facilitate building next-gen DL frameworks. * Collaborate with a very dynamic team across multiple time zones. * Colla
Applying for this Senior Deep Learning Frameworks CUDA Software Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about NVIDIA?
Real rants from real employees. Read before you apply.