NVIDIA

AI

SeniorVisionLanguageModelEngineer

$184–357k Santa Clara, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Vision Language Model Engineer at NVIDIA. Skills: Vision Language Models (VLMs), Agentic AI workflows, Deep Learning, Multimodal Datasets. design and build agentic data and training workflows for Autonomous Vehicles, Robotics, and Medical applications. Partner with our researchers to develop and evaluate prototypes of our latest models, such as VLMs and VLAs, for video search, video understanding, and more”

What You'll Achieve.

redefine the dataset search and model training capabilities in NVIDIA product offerings; impact the most iconic companies in Physical AI; maximize development velocity

Industry & Context.

AI

What They're Looking For.

Must Have

PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field, background in modern deep learning, including transformer‑based architectures, video modeling, and multimodal VLM/VLA or foundation models, Excellent experience training and deploying deep learning models on real‑world datasets: data preprocessing, distributed training, evaluation, debugging, and iterative improvement, Excellent experience with python and at least one deep learning framework, Current with the latest research on image and video search in autonomous vehicles, healthcare, robotics, or related physical AI applications, Fluent with agentic AI workflows across the full applied research lifecycle, including prototyping novel algorithms and search pipelines, benchmarking, and integrating prototypes in production codebases

Nice to Have

track record publishing in top-tier conference such as CVPR, NeuRIPS, ICML, ECCV, Patents in video retrieval or related field, coding architecture skills demonstrated through contributions to large internal or open-source projects, Experience in robotic systems such as autonomous vehicles or humanoid robotics

What You'll Do.

design and build agentic data and training workflows for Autonomous Vehicles

and Medical applications

Partner with our researchers to develop and evaluate prototypes of our latest models

such as VLMs and VLAs

Enable fundamental advances in autonomous driving

Design and implement agentic data workflows that automate data discovery

and retraining to maximize development velocity

and maintain high‑quality multimodal datasets (e. g.

language/action traces) tailored for end‑to‑end physical AI problems

such as autonomous driving

Explore and productize new data sources including simulation and synthetic data

Use agentic AI workflows across the full applied research lifecycle

Contribute to NVIDIA Cosmos Dataset Search and other core NVIDIA platforms and products

How You'll Work.

Team & Collaboration

Collaborate with research, model development, performance, and product teams; experience working well in a dynamic, product- and research-focused team

Communication Scope

Clear and effective communication skills

Full Job Description

NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a senior vision language model engineer to design and build agentic data and training workflows for Autonomous Vehicles, Robotics, and Medical applications. The right person for this role brings technical innovation and collaborative culture to change the way NVIDIA builds dataset search platforms for physical AI developers. Our dataset search offerings are ease to use, performant and scalable. Your work will redefine the dataset search and model training capabilities in NVIDIA product offerings and impact the most iconic companies in Physical AI. **What you 'll be doing:** * Partner with our researchers to develop and evaluate prototypes of our latest models, such as VLMs and VLAs, for video search, video understanding, and more. Enable fundamental advances in autonomous driving, healthcare, and robotics. * Design and implement agentic data workflows that automate data discovery, labeling, evaluation, and retraining to maximize development velocity. * Build, curate, and maintain high‑quality multimodal datasets (e.g., video, sensor, language/action traces) tailored for end‑to‑end physical AI problems, such as autonomous driving. * Explore and productize new data sources including simulation and synthetic data. * Use agentic AI workflows across the full applied research lifecycle. * Collaborate with research, model development, performance, and product teams. * Contribute to NVIDIA Cosmos Dataset Search and other core NVIDIA platforms and products. **What we need to see:** * PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field * Strong background in modern deep learning, including transformer‑based architectures, video modeling, and multimodal VLM/VLA or foundation models. * Excellent experience training and deploying deep learning models on real‑world

Free ATS check

Applying for this Senior Vision Language Model Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →