Company

DeepLearningPerformanceArchitect,CUTLASSDSL

$285–430k ~AI est. Shanghai, China FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Deep Learning Performance Architect, CUTLASS DSL. Skills: Deep Learning, Performance Architecture, CUTLASS DSL. Design CUTLASS DSL. Develop CUTLASS DSL”

What You'll Achieve.

Deliver performance comparable to CUTLASS C++; Enable efficient hardware-software co-design

Industry & Context.

Problems you'll solve

Performance analysis; Performance optimization

What They're Looking For.

Must Have

MS, PhD, or equivalent experience, 2+ years of relevant work experience, Excellent programming skills in Python, Proficiency in C++, Hands-on experience with DSLs, Hands-on experience with compilers, Hands-on experience with code generation systems, Command of the MLIR/LLVM stack, IR design and pass optimization

Nice to Have

Deep understanding of CUDA, GPU microarchitecture knowledge, Performance analysis techniques, Performance optimization techniques, Familiarity with CuTe ecosystem, Familiarity with high-performance computing abstractions

What You'll Do.

Advance MLIR dialects

Build lowering passes

Advance lowering passes

Build code generation flows

Advance code generation flows

Improve kernel compilation speed

Collaborate with architecture teams

Collaborate with research teams

Collaborate with software product teams

Collaborate with open-source community

How You'll Work.

Team & Collaboration

Architecture teams; Research teams; Software product teams; Open-source community

Full Job Description

Are you passionate about programming languages, compiler technology, and GPU performance? Do you want to help shape the future of high-performance kernel development for AI? We are looking for outstanding engineers to build CUTLASS DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it. In this role, you will also help accelerate kernel compilation while delivering performance comparable to CUTLASS C++, enabling efficient hardware-software co-design for NVIDIA's next generation of AI platforms. **What you'll be doing: ** * Design, develop, and optimize CUTLASS DSL, a Python-native language for high-performance GPU kernel development * Build and advance the MLIR dialects, lowering passes, and code generation flows that power the CUTLASS DSL stack * Drive innovations that improve kernel compilation speed while maintaining performance on par with CUTLASS C++ * Collaborate closely with architecture, research, software product teams, and the open-source community to bring cutting-edge optimizations into real products **What we need to see: ** * MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field * 2+ years of relevant work experience * Excellent programming skills in Python and strong proficiency in C++ * Hands-on experience with DSLs, compilers, or code generation systems * Strong command of the MLIR/LLVM stack, including IR design and pass optimization * Strong communication skills and the ability to thrive in a highly collaborative environment **Ways to stand out from the crowd:** * Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis and optimization techniques * Familiarity with key high-performance computing abstractions such as Layout, Tile, MMA, and TMA in the CuTe ecosystem

Free ATS check

Applying for this Deep Learning Performance Architect, CUTLASS DSL role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →