NVIDIA

Enterprise AI

SeniorTechnicalProductManager–DGXEnterpriseInfrastructureandCloud-NativeOperations

$208–380k Santa Clara, California, United States FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Technical Product Manager – DGX Enterprise Infrastructure and Cloud-Native Operations at NVIDIA. Skills: Product Management, Enterprise AI, DGX Enterprise Infrastructure, Cloud-Native Operations, Kubernetes, On-Premise Infrastructure Management. set the vision for the Enterprise Operational Gold Standard. define how the world’s most sophisticated companies deploy, manage, and scale their Enterprise AI Factories”

What You'll Achieve.

deliver the "NVIDIA Experience" within the customer’s data center; transform raw DGX hardware into a high-availability, self-healing AI Factory; eliminate "management snowflakes"; ensuring that every enterprise DGX deployment is standardized, repeatable, and resilient; keep the fleet at peak performance without manual intervention

Industry & Context.

Enterprise AI

Problems you'll solve

How do you make a 1, 000-node private cluster feel as fluid, scalable, and invisible as the public cloud?; When a job slows down in a private data center, your framework should provide the "one-click" answer—isolating a thermal throttle, a degraded InfiniBand rail, or a cabling fault instantly.

What They're Looking For.

Must Have

12+ years demonstrated ability in Product Management, specific around on-premise infrastructure, private cloud, or large-scale systems management, Bachelors Degree in Computer Science or related field or equivalent experience, The "Platform-First" Approach: A track record of turning complex hardware operations into software-defined workflows, Cloud-Native Expertise: Expert-level understanding of Kubernetes operators, container orchestration, and how to translate physical hardware constraints into declarative code, Operational Scars: You’ve lived through the challenges of managing large-scale Linux fleets in air-gapped or restricted enterprise environments, Technical Breadth: Deep familiarity with data center networking (InfiniBand/Ethernet), storage architectures, and the firmware-to-OS handshake, Leadership & Evolution: This is a high-visibility role at the intersection of multiple engineering fields, explicit expectation to transition into formal people management as the team expands

Nice to Have

Automation Evangelist: You have experience with infrastructure-as-code (Ansible, Terraform, Pulumi) in a bare-metal context, AIOps Pioneer: You have a vision for using AI to manage AI—applying telemetry and machine learning to predict and prevent infrastructure failures

What You'll Do.

set the vision for the Enterprise Operational Gold Standard

define how the world’s most sophisticated companies deploy

and scale their Enterprise AI Factories

Productize the On-Prem Lifecycle

Build the "Pit Crew" (Observability)

Bridge Hardware to Kubernetes

Drive Predictive Operations

Full Job Description

NVIDIA is seeking a world-class Senior Product Manager to architect for the operational future of Enterprise AI. While the NVIDIA DGX is the undisputed "Gold Standard" for AI performance, the enterprise on-premise environment presents an outstanding challenge: How do you make a 1,000-node private cluster feel as fluid, scalable, and invisible as the public cloud? The mission is to deliver the "NVIDIA Experience" within the customer’s data center. In this role, own the software-defined blueprint that transforms raw DGX hardware into a high-availability, self-healing AI Factory! **What You’ll Be Doing:** In this role, set the vision for the Enterprise Operational Gold Standard. You will define how the world’s most sophisticated companies deploy, manage, and scale their Enterprise AI Factories. * Productize the On-Prem Lifecycle: Define the "Day 0 through Day 2" experience for DGX SuperPODs. Lead the development of products that handle everything from bare-metal provisioning and network fabric configuration to automated "one-click" firmware rollouts. * Build the "Pit Crew" (Observability): Develop a definitive telemetry and diagnostic suite. When a job slows down in a private data center, your framework should provide the "one-click" answer—isolating a thermal throttle, a degraded InfiniBand rail, or a cabling fault instantly. * Bridge Hardware to Kubernetes: Lead the integration of DGX systems into the cloud-native ecosystem. Ensure that enterprise-grade features like GPU partitioning (MIG), multi-node scaling, and niche scheduling are declarative and seamless. * Standardize at Scale: You aren't just building scripts; but building APIs and Services. Your goal is to eliminate "management snowflakes," ensuring that every enterprise DGX deployment is standardized, repeatable, and resilient. * Drive Predictive Operations: Move the needle from reactive maintenance to self-healing infrastructure. Thoughtfully define the features for automated health checks that keep the fle

Free ATS check

Applying for this Senior Technical Product Manager – DGX Enterprise Infrastructure and Cloud-Native Operations role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 36 detected · ranked by frequency

Product Management ×3

Productize the On-Prem Lifecycle ×3

Define the "Day 0 through Day 2" experience for DGX SuperPODs ×3

Lead the development of products that handle everything from bare-metal provisioning and network fabric configuration to automated "one-click" firmware rollouts ×3

Develop a definitive telemetry and diagnostic suite ×3

Bridge Hardware to Kubernetes ×3

Lead the integration of DGX systems into the cloud-native ecosystem ×3

Ensure that enterprise-grade features like GPU partitioning (MIG), multi-node scaling, and niche scheduling are declarative and seamless ×3

Standardize at Scale ×3

building APIs and Services ×3

eliminate "management snowflakes" ×3

Drive Predictive Operations ×3

Move the needle from reactive maintenance to self-healing infrastructure ×3

Thoughtfully define the features for automated health checks ×3

Enterprise AI ×2

DGX Enterprise Infrastructure ×2

Cloud-Native Operations ×2

Kubernetes ×2

On-Premise Infrastructure Management ×2

Ansible ×2

Terraform ×2

Pulumi ×2

Kubernetes operators

container orchestration

InfiniBand

Ethernet

on-premise infrastructure

private cloud

large-scale systems management

software-defined workflows

enterprise-grade features

declarative code

BEHAVIOURAL

creativeautonomous

Role Details

Seniority manager

Experience 12–10 yrs

Level Senior

Type FULL TIME

Education Bachelors Degree in Computer Science or related field or equ

Salary Band 200k+

AI-Extracted Insights

Domain Areas

enterprise-aienterprise-data-centeron-premise-infrastructureprivate-cloudlarge-scale-systems-managementcloud-native-ecosystemlinux-fleets-in-air-gapped-or-restricted-enterprise-environmentsdata-center-networking

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →