NVIDIA

SeniorFull-StackLeadEngineer

$224–357k Santa Clara, California, United States FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Full-Stack Lead Engineer at NVIDIA. Skills: Full-stack depth, cloud expertise, containers and orchestration, CI/CD and safe deployment practices, API design, machine learning platforms. Lead the architecture and delivery of high-scale web products across frontend, backend services, and data layers, with clear availability and latency targets (SLOs/SLAs). Own multi-team initiatives end to end: problem discovery, RFCs/design reviews, phased rollouts, and success metrics tied to product and ”

What You'll Achieve.

ensure NVIDIA’s AI infrastructure is used efficiently, transparently, and at scale; build a unified, self-service “single pane of glass” portal that enables AI researchers to efficiently manage, monitor, and optimize their use of Managed AI research Superclusters; meet exascale standards for reliability, performance, and observability; reduce complexity, support load and long-term tech debt; accelerate the work of AI researchers; improve code quality, testing, security, and observability; drive adoption of best practices within the team

Industry & Context.

Problems you'll solve

problem-solving ability; GPU cluster debugging; performance triage; root-cause analysis

What They're Looking For.

Must Have

12+ years of software engineering experience delivering production web systems, Bachelor’s degree or higher in Computer Science or a related technical field (or equivalent experience), cross-functional collaboration skills, including active listening, translating complex use cases into clear technical requirements, and designing data models aligned with business logic and outcomes, Deep cloud expertise (AWS, GCP, or Azure), infrastructure as code, containers, orchestration (Docker, Kubernetes), mature CI/CD and safe deployment practices, Full-stack depth: modern SPA frameworks (React/Next. js or Vue/Nuxt), JavaScript/TypeScript, one or more backend languages (Node. js, Python, and/or Golang), Proficiency in API design (REST), schema evolution, integration patterns, automated testing, Experience building machine learning platforms or self-service internal infrastructure tools focused on efficiency, resiliency, and observability, Clear written and verbal communication skills, problem-solving ability, a growth mindset, Experience leveraging AI-assisted development tools (e. g. , Cursor)

Nice to Have

Hands-on ML platform depth (MLE experience or familiarity with DL frameworks such as PyTorch, TensorFlow, distributed training ecosystems like Ray), Datacenter-scale operational experience, including GPU cluster debugging, performance triage, and root-cause analysis across complex distributed systems

What You'll Do.

Lead the architecture and delivery of high-scale web products across frontend

with clear availability and latency targets (SLOs/SLAs)

Own multi-team initiatives end to end: problem discovery

and success metrics tied to product and business outcomes

and observability improvements to meet exascale standards

Establish engineering standards and reusable platforms/design systems to reduce complexity

support load and long-term tech debt

Collaborate with NVIDIA AI Research teams to identify pain points and deliver the next generation user experience that accelerates their work

Mentor and sponsor improve code quality

and observability through reviews

Stay ahead of AI/ML infrastructure trends and drive adoption of best practices within the team

How You'll Work.

Team & Collaboration

cross-functional collaboration skills; Collaborate with NVIDIA AI Research teams

Communication Scope

Clear written and verbal communication skills; active listening

Process & Methodology

Own multi-team initiatives end to end: problem discovery, RFCs/design reviews, phased rollouts, and success metrics tied to product and business outcomes

Full Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 30 years. Today, we're at the forefront of AI innovation powering breakthroughs in research, autonomous vehicles, robotics, and more. The DGX Cloud team builds and operates the AI infrastructure that fuels this progress. We’re looking for a Senior Full-Stack Software Engineer to join the AI Hub team within the DGX Cloud AI Infrastructure organization. The AI Hub team accelerates AI research by ensuring NVIDIA’s AI infrastructure is used efficiently, transparently, and at scale. Our primary goal is to build a unified, self-service “single pane of glass” portal that enables AI researchers to efficiently manage, monitor, and optimize their use of Managed AI research Superclusters. **What You’ll Be Doing:** * Lead the architecture and delivery of high-scale web products across frontend, backend services, and data layers, with clear availability and latency targets (SLOs/SLAs). * Own multi-team initiatives end to end: problem discovery, RFCs/design reviews, phased rollouts, and success metrics tied to product and business outcomes. * Drive reliability, performance, and observability improvements to meet exascale standards. * Establish engineering standards and reusable platforms/design systems to reduce complexity, support load and long-term tech debt. * Collaborate with NVIDIA AI Research teams to identify pain points and deliver the next generation user experience that accelerates their work. * Mentor and sponsor engineers; improve code quality, testing, security, and observability through reviews, pairing, and coaching. * Stay ahead of AI/ML infrastructure trends and drive adoption of best practices within the team. **What We Need To See:** * 12+ years of software engineering experience delivering production web systems. * Bachelor’s degree or higher in Computer Science or a related technical field (or equivalent experience). * Strong cross-functional collaboration skill

Free ATS check

Applying for this Senior Full-Stack Lead Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 49 detected · ranked by frequency

Full-stack depth ×5

API design ×5

machine learning platforms ×5

AI-assisted development tools ×4

modern SPA frameworks ×3

JavaScript/TypeScript ×3

backend languages ×3

schema evolution ×3

integration patterns ×3

automated testing ×3

self-service internal infrastructure tools ×3

ML platform depth ×3

DL frameworks ×3

distributed training ecosystems ×3

Datacenter-scale operational experience ×3

GPU cluster debugging ×3

performance triage ×3

root-cause analysis ×3

complex distributed systems ×3

cloud expertise ×2

containers and orchestration ×2

CI/CD and safe deployment practices ×2

Docker ×2

Kubernetes ×2

OpenSearch ×2

Prometheus ×2

Grafana ×2

Loki ×2

AWS

GCP

Azure

React

BEHAVIOURAL

active listeninggrowth mindsetcollaboration

Role Details

Seniority senior

Experience 12–10 yrs

Level Senior

Work Mode No

Type FULL TIME

Education Bachelor’s degree or higher in Computer Science or a related

Salary Band 200k+

AI-Extracted Insights

Domain Areas

ai-innovationai-infrastructureai-researchmanaged-ai-research-superclustersmachine-learning-platformsml-platform-depthdl-frameworksdistributed-training-ecosystems

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →