Nebius

Cloud Infrastructure

ITInfrastructureEngineerRMA&HardwareDiagnostics

$77–184k Independence, Missouri, United States
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“IT Infrastructure Engineer – RMA & Hardware Diagnostics at Nebius. Skills: Hardware Diagnostics, RMA Lifecycle Management, Root Cause Analysis, Enterprise Server Hardware. Perform advanced firmware and hardware diagnostics. Troubleshoot complex hardware failures”

What You'll Achieve.

Reduce repeat failures; Improve overall hardware quality; Improve hardware reliability; Improve SLA performance; Improve operational scalability; Prevent repeat failures; Drive corrective action; Report metrics such as repeat RMA rates and component reliability; Improve hardware lifecycle processes; Reducing MTTR; Improving fleet-wide reliability

Industry & Context.

Cloud Infrastructure
Problems you'll solve

Advanced hardware troubleshooting; Root cause analysis; Process improvements

Eligibility Requirements

Work on-site in one of our data centers, Valid driver’s license, Applicants must be authorized to work in the country in which they apply

What They're Looking For.

Must Have

5+ years of hands-on experience working with enterprise server hardware in a production data center environment, Deep understanding of x86 server architecture, including CPUs, memory, PCIe devices, storage controllers, GPUs, and power subsystems, experience performing firmware and BIOSMC diagnostics and upgrades, Advanced Linux command-line troubleshooting skills, including log analysis and hardware-level diagnostics, Experience working with remote management interfaces such as IPMI, iDRAC, iLO, or equivalent, Proven experience managing hardware RMA processes and working directly with OEM vendors, Ability to conduct structured root cause analysis and document technical findings clearly, Familiarity with hardware monitoring systems and failure trend analysis, ownership mindset and ability to operate independently in mission-critical environments, High proficiency in spoken and written English

Nice to Have

Experience performing board-level diagnostics and component-level repair (SMD rework), Familiarity with data center networking equipment and basic network troubleshooting, Experience supporting GPU-dense or high-performance compute environments

What You'll Do.

Perform advanced firmware and hardware diagnostics

Troubleshoot complex hardware failures

Act as primary escalation point

Conduct structured root cause analysis

Own the full RMA lifecycle

Interface directly with OEM vendors

Analyze hardware failure trends

Develop and standardize diagnostic playbooks

Validate replacement components

Contribute to reducing MTTR

How You'll Work.

Team & Collaboration

Collaborating closely with L1/L2 technicians; Collaborating with infrastructure engineers; Collaborating with vendors; Collaborate cross-functionally with data center operations; Collaborate with procurement; Collaborate with engineering teams

Communication Scope

High proficiency in spoken and written English

Full Job Description

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R& D. The role We are seeking an IT Infrastructure Engineer – RMA & Hardware Diagnostics to own advanced hardware troubleshooting and RMA lifecycle management within our production data center environments. This role serves as the escalation point for complex server and firmware-related issues that impact system reliability and fleet availability. You will be responsible for deep diagnostics across enterprise server platforms, performing structured root cause analysis, validating failed components, and managing end-to-end warranty replacement processes with OEM vendors. This is a hands-on technical role with direct impact on hardware reliability, SLA performance, and operational scalability. You will work on-site in one of our data centers, collaborating closely with L1/L2 technicians, infrastructure engineers, and vendors to reduce repeat failures and improve overall hardware quality across the fleet. You’re welcome to work in our Minnesota location. Your responsibilities will include: Perform advanced firmware and hardware diagnostics on enterprise server platforms, including CPU, memory, PCIe devices, GPUs, storage subsystems, and power components Troubleshoot complex hardware failures using system logs, BMC/IPMI interfaces, BIOS

Free ATS check

Applying for this IT Infrastructure Engineer – RMA & Hardware Diagnostics role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Nebius?

Real rants from real employees. Read before you apply.

Read Company Rants →