NVIDIA
Technology
AIInferencePerformanceEngineer-NewCollegeGrad2026
Neural analysis suggests this role is
optimal for Entry candidates.
“AI Inference Performance Engineer - New College Grad 2026 at NVIDIA. Skills: AI Inference, Performance Engineering, Deep Learning, GPU Computing. Drive industry benchmark results. Optimize end-to-end pipeline”
Industry & Context.
Bottleneck decomposition; Troubleshooting
What They're Looking For.
Must Have
BS, MS, or PhD, 2+ years software development experience, Python or C++ programming, Software design skills, Software engineering skills, Expertise with PyTorch or JAX, Deliver measurable performance improvements
Nice to Have
Prior LLM framework experience, Prior DL compiler experience, Prior performance modeling experience, Prior profiling experience, Prior debug experience, Prior code optimization experience, Experience with scale-out inference orchestration, Expertise in kernel development, Expertise in compiler/runtime paths, Architectural knowledge of CPU, GPU, FPGA, DL GPU programming experience, Track record leading technical programs
What You'll Do.
Drive industry benchmark results
Optimize end-to-end pipeline
Implement optimizations in quantization
Integrate optimizations in quantization
Implement optimizations in scheduling
Integrate optimizations in scheduling
Implement optimizations in memory management
Integrate optimizations in memory management
Implement optimizations in distributed inference
Integrate optimizations in distributed inference
Define cutting-edge workloads
Optimize cutting-edge workloads
Identify next-generation inference benchmarks
Shape next-generation inference benchmarks
Identify emerging AI use cases
Shape emerging AI use cases
Push performance on LLM-MoE models
Push performance on vision-language models
Push performance on video diffusion models
Push performance on recommendation workloads
Push performance on speech workloads
Design distributed inference execution
Optimize distributed inference execution
Manage performance across GPU clusters
Apply roofline analysis
Apply systematic profiling
Decompose bottlenecks across CUDA kernels
Decompose bottlenecks across frameworks
Decompose bottlenecks across serving layers
Contribute to TensorRT-LLM
Contribute to open-source projects
Raise technical bar for team
Drive cross-functional execution
Lead world-class team
How You'll Work.
Team & Collaboration
Cross-functional execution; Framework teams; Kernel teams; Architecture teams; Compiler teams
Process & Methodology
Benchmark timelines
Full Job Description
We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. **What You Will Be Doing:** * Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. * Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads. * Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs. * Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers. * Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data. * Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team. **What We Need To See:** * BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience. * 2+ years of relevant software development experience. * Strong Python or C++ programming, software design, and software engineering skills. * Expertise with a DL fr
Applying for this AI Inference Performance Engineer - New College Grad 2026 role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about NVIDIA?
Real rants from real employees. Read before you apply.