NVIDIA

Technology

SeniorSoftwareEngineer,CUTLASSPlatform

$152–288k Santa Clara, California, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Software Engineer, CUTLASS Platform at NVIDIA. Skills: GPU hardware features, Compiler backend, High performance kernels. Develop core CUTLASS platform components. Develop Tensor Core MMAs”

Industry & Context.

Technology

Problems you'll solve

Debugging; Performance evaluation

What They're Looking For.

Must Have

Masters or PhD degree, 3+ years relevant industry experience, Proficiency in C++ programming, Software design experience, Debugging experience, Performance evaluation experience, Testing experience, High-performance code generation experience, Knowledge of compiler transformations, Knowledge of compiler optimizations, Deep understanding of computer architecture, Deep understanding of parallel computing programming models

Nice to Have

Experience writing low-level kernels, Hands-on compiler design experience, MLIR experience, Understanding of deep learning models, Understanding of deep learning algorithms, Understanding of deep learning frameworks

What You'll Do.

Develop core CUTLASS platform components

Develop Tensor Core MMAs

Develop synchronization barriers

Develop GPU hardware features

Contribute to MLIR-based backend compiler stack

Design compiler passes

Author example kernels

Showcase novel GPU hardware features

Collaborate with GPU architecture teams

Collaborate with CUDA teams

Collaborate with NVVM/PTX compiler teams

Provide feedback on programming models

Assess performance of future GPU hardware features

How You'll Work.

Team & Collaboration

GPU architecture teams; CUDA teams; NVVM/PTX compiler teams

Full Job Description

NVIDIA's high-performance computing platforms are powering the AI revolution across many applications and industries. Within our software stack, [_CUTLASS_](https://github.com/NVIDIA/cutlass) stands out as a popular open-source ecosystem dedicated to high-performance linear algebra and Tensor Core primitives. Since 2017, it has provided the community with C++ and Python abstractions to implement custom matrix multiply (GEMM) and related math and deep learning computations on NVIDIA GPUs. If you are passionate about designing abstractions for Tensor Core and related GPU hardware features in MLIR, Python, and C++ that enable writing high performance kernels, apply to join the CUTLASS team today! **What you 'll be doing:** * Develop core components of the CUTLASS platform including Tensor Core MMAs, copies, synchronization barriers, schedulers, and other GPU hardware features in CUDA C++ and CUTLASS Python DSL. * Contribute to the advancement of the MLIR-based backend compiler stack for the CUTLASS Python DSL by designing dialects and associated compiler passes. * Author example kernels utilizing CUTLASS abstractions to showcase the use of novel GPU hardware features that are crucial for achieving high performance. * Collaborate with GPU architecture, CUDA, and NVVM/PTX compiler teams to provide feedback on programming models and to assess the performance of future GPU hardware features. **What we need to see:** * Masters or PhD degree in Computer Science, Computer Engineering, or related field (or equivalent experience). * 3+ years of relevant industry experience. * Strong proficiency in C++ programming and software design, including debugging, performance evaluation, and testing. * Experience working with high-performance code generation and knowledge of compiler transformations and optimizations. * A deep understanding of computer architecture and parallel computing programming models. **Ways to stand out from the crowd:** * Experience writing high-performance kerne

Free ATS check

Applying for this Senior Software Engineer, CUTLASS Platform role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 17 detected · ranked by frequency

High performance kernels ×5

Compiler design ×3

Deep learning ×3

GPU hardware features ×2

Compiler backend ×2

Python

CUDA C++

MLIR

NVVM

PTX

Software design

Performance evaluation

Code generation

Compiler transformations

Compiler optimizations

Computer architecture

Parallel computing

Role Details

Seniority senior

Experience 3–10 yrs

Level Senior

Work Mode Onsite

Type FULL TIME

Education Master's

Salary Band 150k-200k

AI-Extracted Insights

Domain Areas

linear-algebratensor-core-primitivesmatrix-multiplydeep-learning-computationsgpu-hardwareparallel-computingdeep-learning-modelsdeep-learning-algorithms

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →