NVIDIA

MachineLearningSystemsEngineer,Networking

$152–288k Santa Clara, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Machine Learning Systems Engineer, Networking at NVIDIA. Skills: ML algorithms, real-time streaming pipelines, production ML algorithms, anomaly detection, predictive analytics, ML pipelines, data science depth, engineering discipline. Implement production ML algorithms in Go — optimized for real-time streaming pipelines operating at massive scale under strict resource constraints. Design and develop new ML algorithms where needed: anomaly detection, health scoring, and predictive analytics on h”

What You'll Achieve.

turns raw, high-volume telemetry into reliable, job-centric insights and automation for GPU fleets; detecting anomalies and surfacing insights across massive-scale infrastructure before they impact AI training and inference

Industry & Context.

Problems you'll solve

core challenge of this role is building ML algorithms that are simultaneously accurate and efficient; processing millions of telemetry streams in real time within tight CPU and memory budgets

What They're Looking For.

Must Have

BS (or equivalent experience) and 5+ years of experience, MS and 3+ years, or PhD with 1+ years in Computer Science, Statistics, or a related field, mathematical foundation: statistics, probability, linear algebra, and algorithm analysis, Proven experience implementing and optimizing ML algorithms in production, coding-first implementation skills are required, programming skills in one or more of Go, C/C++, Rust, or Python, Familiarity with time-series databases and streaming data architectures, Ability to work independently and navigate ambiguity in a fast-paced engineering environment

Nice to Have

Data Science background with hands-on experience building and validating ML models, Experience implementing ML algorithms directly in systems languages for latency-sensitive or resource-constrained environments, Research experience: knowing the latest ML literature and translating advances into practical improvements, Experience with Kafka-based streaming pipelines and real-time feature engineering at scale

What You'll Do.

Implement production ML algorithms in Go — optimized for real-time streaming pipelines operating at massive scale under strict resource constraints

Design and develop new ML algorithms where needed: anomaly detection

and predictive analytics on high-volume time-series telemetry from GPU and network infrastructure

Improve and extend existing algorithms and experiment with new approaches suited to real-time streaming constraints

Build and maintain end-to-end ML pipelines — from data ingestion and schema design through model inference — optimized for on-premises

latency-sensitive deployments

How You'll Work.

Team & Collaboration

Partner with the Data Science team on algorithm design, prototype evaluation, and translating research findings into platform requirements

Full Job Description

Join our team of innovative engineers who are building an AI Data Center AIOps platform that turns raw, high-volume telemetry into reliable, job-centric insights and automation for GPU fleets. As an ML Engineer on this team, you'll design and implement ML algorithms that run in real-time streaming pipelines, detecting anomalies and surfacing insights across massive-scale infrastructure before they impact AI training and inference. The core challenge of this role is building ML algorithms that are simultaneously accurate and efficient —processing millions of telemetry streams in real time within tight CPU and memory budgets. You'll need both the data science depth to design and validate algorithms and the engineering discipline to implement them in production at scale. **What you 'll be doing:** * Implement production ML algorithms in Go — optimized for real-time streaming pipelines operating at massive scale under strict resource constraints * Design and develop new ML algorithms where needed: anomaly detection, health scoring, and predictive analytics on high-volume time-series telemetry from GPU and network infrastructure * Improve and extend existing algorithms and experiment with new approaches suited to real-time streaming constraints * Build and maintain end-to-end ML pipelines — from data ingestion and schema design through model inference — optimized for on-premises, latency-sensitive deployments * Partner with the Data Science team on algorithm design, prototype evaluation, and translating research findings into platform requirements **What we need to see:** * A BS (or equivalent experience) and 5+ years of experience, MS and 3+ years, or PhD with 1+ years in Computer Science, Statistics, or a related field * Strong mathematical foundation: statistics, probability, linear algebra, and algorithm analysis * Proven experience implementing and optimizing ML algorithms in production — this is a coding-first role; strong implementation skills are required * Stron

Free ATS check

Applying for this Machine Learning Systems Engineer, Networking role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about NVIDIA?

Real rants from real employees. Read before you apply.

Read Company Rants →