NVIDIA

AI

SeniorDeepLearningAlgorithmsEngineer-BioNeMo

Ho Chi Minh City, Vietnam FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Deep Learning Algorithms Engineer - BioNeMo at NVIDIA. Skills: Deep Learning Algorithms, Model Optimization, Inference Optimization, TensorRT-LLM, GPU Performance Engineering, Custom GPU Kernel Development. optimize cutting-edge biology and structural biology models, including LLMs and VLMs, for maximum performance and efficiency on NVIDIA GPUs. focus on world-class inference for workloads like protein structure prediction and design”

What You'll Achieve.

maximum performance and efficiency on NVIDIA GPUs; world-class inference; industry-leading, scalable performance; low-latency, high-throughput inference; smooth production transition; strict SLOs; high QPS; multi-GPU/node inference; cost/perf ownership; clear metrics

Industry & Context.

AI
Problems you'll solve

resolving kernel/graph bottlenecks; debugging and extending for novel architectures

What They're Looking For.

Must Have

MS/PhD in CS, EE, Comp. Eng., or equivalent practical experience, 5+ years professional experience in deep learning/applied ML, with a track record of deploying optimized models/inference paths in production (not research prototypes), foundation in transformer/diffusion direct experience with LLMs, VLMs, or large biology models (e.g., structure prediction), Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment, Python/C++; ability to read/modify performance-critical C++/CUDA code for inference stacks and custom ops, Practical experience with TensorRT/TensorRT-LLM: model conversion, optimization, deployment, and performance measurement (latency/throughput) under realistic conditions, Familiarity with GPU performance engineering: profiling (Nsight), roofline analysis, and optimization of kernels/memory, experience writing/extending custom GPU kernels for model hot paths is required

Nice to Have

Led or significantly contributed to large-scale LLM/VLMiology model serving (strict SLOs, high QPS, multi-GPU/node inference, cost/perf ownership), Deep customization of, or substantial contributions to, TensorRT-LLM, vLLM, SGLang, or comparable stacks, including debugging and extending for novel architectures, End-to-end ownership of FP8/INT8 (or other formats), including calibration, regression testing, and documenting accuracy vs. speed tradeoffs on biology workloads, familiarity with protein structure, docking, or diffusion-based design and model families (e.g., OpenFold, Boltz, ESM, RFDiffusion, DiffDock)—demonstrated by benchmarks, publications, or open-source work, Repeated success taking non-text architectures (geometric, multimodal, structure-centric) from research/checkpoint to optimized, production-ready inference with clear metrics as well as examples of writing, maintaining, or upstreaming custom kernels or fused ops that produced measurable gains on real models or hardware

What You'll Do.

optimize cutting-edge biology and structural biology models

including LLMs and VLMs

for maximum performance and efficiency on NVIDIA GPUs

focus on world-class inference for workloads like protein structure prediction and design

move next-gen AI models (e.g.

OpenFold2/3) from research to production serving via TensorRT-LLM and related stacks

ensure industry-leading

scalable performance for scientists and developers

Integrate TensorRT-LLM for BioNeMo models (Boltz1–2

OpenFold2–3) and upcoming structural biology models (RFDiffusion

Optimize models for low-latency

high-throughput inference using parallelism

quantization (FP8/INT8)

Profile and debug deep learning workloads on GPUs

resolving kernel/graph bottlenecks in training/inference

including custom operators

Develop and validate custom GPU kernels (CUDA

Triton) for hot paths

and non-standard blocks in structural biology models

Collaborate with research to align model architecture and training with deployment constraints for smooth production transition

How You'll Work.

Team & Collaboration

collaborate across teams to move next-gen AI models from research to production serving; Collaborate with research to align model architecture and training with deployment constraints for smooth production transition

Full Job Description

Join NVIDIA as a Senior Deep Learning Algorithms Engineer to optimize cutting-edge biology and structural biology models, including LLMs and VLMs, for maximum performance and efficiency on NVIDIA GPUs. Focus on world-class inference for workloads like protein structure prediction and design. As part of BioNeMo, you will collaborate across teams to move next-gen AI models (e.g., Boltz1/2, OpenFold2/3) from research to production serving via TensorRT-LLM and related stacks, ensuring industry-leading, scalable performance for scientists and developers. **What you will be doing:** * Integrate TensorRT-LLM for BioNeMo models (Boltz1–2, OpenFold2–3) and upcoming structural biology models (RFDiffusion, DiffDock, ProteinNMN, Evo2, ESM3). * Optimize models for low-latency, high-throughput inference using parallelism, quantization (FP8/INT8), and sparsity/pruning. * Profile and debug deep learning workloads on GPUs, resolving kernel/graph bottlenecks in training/inference, including custom operators. * Develop and validate custom GPU kernels (CUDA, Triton) for hot paths, memory-bound ops, and non-standard blocks in structural biology models. * Collaborate with research to align model architecture and training with deployment constraints for smooth production transition. **What we want to see:** * MS/PhD in CS, EE, Comp. Eng., or equivalent practical experience. * 5+ years professional experience in deep learning/applied ML, with a track record of deploying optimized models/inference paths in production (not research prototypes). * Strong foundation in transformer/diffusion architectures; direct experience with LLMs, VLMs, or large biology models (e.g., structure prediction). * Proficient in PyTorch (and/or TensorFlow) for production-grade model building, debugging, and deployment. * Strong Python/C++; ability to read/modify performance-critical C++/CUDA code for inference stacks and custom ops. * Practical experience with TensorRT/TensorRT-LLM: model conversion, optimization,

Free ATS check

Applying for this Senior Deep Learning Algorithms Engineer - BioNeMo role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →