NVIDIA
Artificial Intelligence
SeniorNetworkReliabilityEngineer-DGXCloud
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Network Reliability Engineer - DGX Cloud at NVIDIA. Skills: Network Reliability Engineering, Cloud and Datacenter Network Infrastructures, TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, MACsec, AWS, Azure, GCP, OCI, Arista, Fortinet, Juniper, Python, Shell scripting. support and maintain cloud and datacenter network infrastructures. remediate critical alerts within defined SLAs”
What You'll Achieve.
remediate critical alerts within defined SLAs; Drive operational improvements
Industry & Context.
outstanding troubleshooting skills; creative problem-solving abilities; network troubleshooting techniques
24/7 global shift rotations, remote support
What They're Looking For.
Must Have
5+ years of experience in network operations, Bachelor’s degree in Computer Science, related technical field, or equivalent experience, Excellent verbal and written communication skills
Nice to Have
Solid understanding of Mellanox/Cumulus OS and Infiniband technology, Skilled in Unix/Linux system administration, with the ability to write and understand Python/Shell scripts to improve efficiency in hyperscale environments, Familiarity with leveraging tools such as Netbox/Nautobot, Prometheus, Grafana, Panoptes to monitor and manage a global network
What You'll Do.
support and maintain cloud and datacenter network infrastructures
remediate critical alerts within defined SLAs
triage production impacting network incidents
interact with internal customers on network related issues
engage with external vendors to remediate hardware and software issues
participate in project related work such as network device upgrades and capacity augmentations
Engage in 24/7 global shift rotations to provide remote support for network repairs and changes
Drive operational improvements in change management and daily operations by following procedures
Manage and operate large scale IP network technologies and infrastructures
Monitor and support the network health of on-premises and cloud infrastructures
Collaborate and develop workflow enhancements
How You'll Work.
Team & Collaboration
collaborating across teams; updating customers on status and ticket information
Communication Scope
Excellent verbal and written communication skills
Full Job Description
NVIDIA is looking for a Senior Network Reliability Engineer to support and maintain our cloud and datacenter network infrastructures. This network serves the needs across the whole software stack for NVIDIA, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence. In this role, the Senior Network Operations Engineer will remediate critical alerts within defined SLAs, triage production impacting network incidents, and interact with internal customers on network related issues. They will also be responsible for engaging with external vendors to remediate hardware and software issues, and participate in project related work such as network device upgrades and capacity augmentations. An ideal candidate will possess a wide range of skills, including alert monitoring & resolution in large-scale networks and CSP environments, outstanding troubleshooting skills, understanding of L3 underlay networks, and network protocol knowledge in large multi-vendor infrastructures. **What you will be doing:** * Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information. * Drive operational improvements in change management and daily operations by following procedures. * Manage and operate large scale IP network technologies and infrastructures. * Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits. * Monitor and support the network health of on-premises and cloud infrastructures. * Collaborate and develop workflow enhancements while documenting best practices. **What we need to see:** * Deep knowledge and experience of TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec. * 5+ years of experience in network operations. * Skilled in network troubleshooting techniques and demonstrating creative problem-solving abilities. * Strong track record of alert
Applying for this Senior Network Reliability Engineer - DGX Cloud role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about NVIDIA?
Real rants from real employees. Read before you apply.