NVIDIA
AI
SeniorDeepLearningAlgorithmsEngineer-BioNeMo
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Deep Learning Algorithms Engineer - BioNeMo at NVIDIA. Skills: Deep Learning Algorithms, Model Optimization, Inference Optimization, TensorRT-LLM, GPU Performance Engineering, Custom GPU Kernel Development. optimize cutting-edge biology and structural biology models, including LLMs and VLMs, for maximum performance and efficiency on NVIDIA GPUs. focus on world-class inference for workloads like protein structure prediction and design”
What You'll Achieve.
maximum performance and efficiency on NVIDIA GPUs; world-class inference; industry-leading, scalable performance; low-latency, high-throughput inference; smooth production transition; strict SLOs; high QPS; multi-GPU/node inference; cost/perf ownership; clear metrics
Industry & Context.
resolving kernel/graph bottlenecks; debugging and extending for novel architectures
What They're Looking For.
Must Have
MS/PhD in CS, EE, Comp. Eng., or equivalent practical experience, 5+ years professional experience in deep learning/applied ML, with a track record of deploying optimized models/inference paths in production (not research prototypes), foundation in transformer/diffusion direct experience with LLMs, VLMs, or large biology models (e.g., structure prediction), Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment, Python/C++; ability to read/modify performance-critical C++/CUDA code for inference stacks and custom ops, Practical experience with TensorRT/TensorRT-LLM: model conversion, optimization, deployment, and performance measurement (latency/throughput) under realistic conditions, Familiarity with GPU performance engineering: profiling (Nsight), roofline analysis, and optimization of kernels/memory, experience writing/extending custom GPU kernels for model hot paths is required
Nice to Have
Led or significantly contributed to large-scale LLM/VLMiology model serving (strict SLOs, high QPS, multi-GPU/node inference, cost/perf ownership), Deep customization of, or substantial contributions to, TensorRT-LLM, vLLM, SGLang, or comparable stacks, including debugging and extending for novel architectures, End-to-end ownership of FP8/INT8 (or other formats), including calibration, regression testing, and documenting accuracy vs. speed tradeoffs on biology workloads, familiarity with protein structure, docking, or diffusion-based design and model families (e.g., OpenFold, Boltz, ESM, RFDiffusion, DiffDock)—demonstrated by benchmarks, publications, or open-source work, Repeated success taking non-text architectures (geometric, multimodal, structure-centric) from research/checkpoint to optimized, production-ready inference with clear metrics as well as examples of writing, maintaining, or upstreaming custom kernels or fused ops that produced measurable gains on real models or hardware
What You'll Do.
optimize cutting-edge biology and structural biology models
including LLMs and VLMs
for maximum performance and efficiency on NVIDIA GPUs
focus on world-class inference for workloads like protein structure prediction and design
move next-gen AI models (e.g.
OpenFold2/3) from research to production serving via TensorRT-LLM and related stacks
ensure industry-leading
scalable performance for scientists and developers
Integrate TensorRT-LLM for BioNeMo models (Boltz1–2
OpenFold2–3) and upcoming structural biology models (RFDiffusion
Optimize models for low-latency
high-throughput inference using parallelism
quantization (FP8/INT8)
Profile and debug deep learning workloads on GPUs
resolving kernel/graph bottlenecks in training/inference
including custom operators
Develop and validate custom GPU kernels (CUDA
Triton) for hot paths
and non-standard blocks in structural biology models
Collaborate with research to align model architecture and training with deployment constraints for smooth production transition
How You'll Work.
Team & Collaboration
collaborate across teams to move next-gen AI models from research to production serving; Collaborate with research to align model architecture and training with deployment constraints for smooth production transition
Full Job Description
Join NVIDIA as a Senior Deep Learning Algorithms Engineer to optimize cutting-edge biology and structural biology models, including LLMs and VLMs, for maximum performance and efficiency on NVIDIA GPUs. Focus on world-class inference for workloads like protein structure prediction and design. As part of BioNeMo, you will collaborate across teams to move next-gen AI models (e.g., Boltz1/2, OpenFold2/3) from research to production serving via TensorRT-LLM and related stacks, ensuring industry-leading, scalable performance for scientists and developers. **What you will be doing:** * Integrate TensorRT-LLM for BioNeMo models (Boltz1–2, OpenFold2–3) and upcoming structural biology models (RFDiffusion, DiffDock, ProteinNMN, Evo2, ESM3). * Optimize models for low-latency, high-throughput inference using parallelism, quantization (FP8/INT8), and sparsity/pruning. * Profile and debug deep learning workloads on GPUs, resolving kernel/graph bottlenecks in training/inference, including custom operators. * Develop and validate custom GPU kernels (CUDA, Triton) for hot paths, memory-bound ops, and non-standard blocks in structural biology models. * Collaborate with research to align model architecture and training with deployment constraints for smooth production transition. **What we want to see:** * MS/PhD in CS, EE, Comp. Eng., or equivalent practical experience. * 5+ years professional experience in deep learning/applied ML, with a track record of deploying optimized models/inference paths in production (not research prototypes). * Strong foundation in transformer/diffusion architectures; direct experience with LLMs, VLMs, or large biology models (e.g., structure prediction). * Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment. * Strong Python/C++; ability to read/modify performance-critical C++/CUDA code for inference stacks and custom ops. * Practical experience with TensorRT/TensorRT-LLM: model conversion, optimization,
Applying for this Senior Deep Learning Algorithms Engineer - BioNeMo role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about NVIDIA?
Real rants from real employees. Read before you apply.