NVIDIA

Technology

SeniorLLMAgentsArchitect

$525–785k ~AI est. Yokneam, Israel FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior LLM Agents Architect at NVIDIA. Skills: LLM Agents, GPU Architecture, CUDA Programming, System Design. Design agentic AI systems. Generate GPU compute kernels”

What You'll Achieve.

Drive significant improvements; Target speed-of-light performance; Rival hand-tuned optimization; Confident iterations; Cost control; Safe improvements

Industry & Context.

Technology

Problems you'll solve

Analysis; Optimization; Identify bottlenecks; Propose mitigations; What-if analysis; Failure recovery

What They're Looking For.

Must Have

8+ years in applied ML/AI, 2+ years crafting agentic LLM applications, B. Sc in Computer Science / Electrical Engineering, Solid grounding in computer architecture, Familiarity with NVIDIA GPU architecture, Hands-on CUDA programming experience, Writing, profiling, and optimizing GPU kernels, Comfortable with Nsight Compute, Nsight Systems, Proven ownership of end-to-end agentic system, Software engineering skills in Python, Proficient in tool use, Proficient in RAG pipelines, Proficient in model adaptation techniques, Demonstrated ability to collaborate with HW/SW experts, Translate heuristics into deterministic tools, Translate heuristics into constraints, Translate heuristics into evaluation metrics, Excellence in communication and facilitation, Aligning diverse collaborators, Documenting decisions/assumptions, Influencing without authority, Track record of building observability for AI systems, Dataset/version management, Offline test suites, Online telemetry, Guardrails/safety checks, Rollback plans

Nice to Have

Familiarity with PyTorch compilation stack, Familiarity with GPU graph compilers, Familiarity with kernel fusion strategies, Familiarity with auto-tuning frameworks, Background in performance engineering, Experience with performance modeling, Experience with hardware simulators, Familiarity with distributed processing, Familiarity with multi-GPU workloads, Familiarity with networking, Familiarity with frontier agentic coding tools, Understanding of tool orchestration, Understanding of context management, Understanding of autonomous task execution, Hands-on experience building domain-specific coding agent, Experience with agent frameworks

What You'll Do.

Design agentic AI systems

Generate GPU compute kernels

Analyze GPU compute kernels

Optimize GPU compute kernels

Collaborate with GPU architects

Collaborate with performance engineers

Encode domain expertise into agent workflows

Build automated performance forensics agents

Ingest simulation traces

Ingest Nsight profiler data

Propose architectural mitigations

Propose software mitigations

Partner with HW architects

Develop agentic flows for GPU studies

Enable rapid what-if analysis

Explore agentic approaches to HW/SW co-design

Replace graph-compiler functionality

Augment graph-compiler functionality

Rapidly prototype solutions

Thoughtfully integrate with internal services

Utilize GPU capabilities

Deliver fitting solutions

Set up evaluation backbone

Implement safe improvements

How You'll Work.

Team & Collaboration

Hardware architects; Verification engineers; GPU performance experts; Software developers; HW/SW domain experts; Diverse collaborators

Communication Scope

Communication; Facilitation; Documentation; Playbooks

Process & Methodology

Requirements, Architecture, Implementation, Evaluation, Incremental hardening

Full Job Description

We don't just build the hardware and software that powers the AI revolution — we are building the AI that designs the next generation of both. Our team sits at the intersection of inference software and GPU architecture, creating autonomous LLM-driven systems that reason about hardware, write high-performance CUDA, and automate the complex loops of architectural simulation, analysis, and optimization. We are looking for a senior LLM Agents Architect to work hands-on with hardware architects, verification engineers, GPU performance experts, and software developers to build end-to-end agent flows that drive significant improvements in kernel optimization, architectural exploration, and developer efficiency. **What you 'll be doing:** * Design and build agentic AI systems that generate, analyze, and optimize GPU compute kernels — targeting speed-of-light performance on NVIDIA hardware. * Collaborate with GPU architects and performance engineers to encode domain expertise — memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning — into agent workflows that rival hand-tuned optimization. * Build automated performance forensics agents capable of ingesting large-scale simulation traces and Nsight profiler data to identify bottlenecks and propose architectural or software mitigations. * Partner with HW architects to develop agentic flows for GPU architectural studies — enabling rapid what-if analysis across micro-architecture configurations such as cache sizing, memory controller design, and compute unit scaling. * Explore agentic approaches to HW/SW co-design challenges, including replacing or augmenting graph-compiler functionality (e.g., TorchInductor) with LLM-driven optimization and code-generation pipelines. * Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions. * Set up evaluation backbone using offline golden sets and online telemetry for confi

Free ATS check

Applying for this Senior LLM Agents Architect role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 77 detected · ranked by frequency

GPU Architecture ×5

CUDA Programming ×5

Kernel optimization ×4

LangChain ×4

LangGraph ×4

CrewAI ×4

System Design ×3

Memory hierarchies ×3

Parallelism models ×3

Pipelining ×3

Cache behavior ×3

Warp scheduling ×3

Shared/global memory ×3

Occupancy reasoning ×3

Performance profiling ×3

Simulation traces ×3

Nsight profiler ×3

Architectural studies ×3

Micro-architecture configurations ×3

Cache sizing ×3

Memory controller design ×3

Compute unit scaling ×3

Graph-compiler functionality ×3

Code-generation pipelines ×3

Offline golden sets ×3

Online telemetry ×3

Agentic AI systems ×3

Automated performance forensics ×3

HW/SW domain experts ×3

Deterministic tools ×3

Constraints ×3

Evaluation metrics ×3

BEHAVIOURAL

Mentoring

Role Details

Seniority senior

Experience 8–10 yrs

Level Senior

Work Mode Onsite

Type FULL TIME

Education Bachelor's

Salary Band 200k+

AI-Extracted Insights

Domain Areas

gpu-architecturememory-hierarchyoccupancy-tuninginstruction-level-reasoninghw-sw-co-designnvidia-hardware

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →