Amazon Data Services, Inc.

Technology

SrHardwareDevelopmentEngineer,HighPerformanceAI&MLServers

$159–215k Austin, Texas, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Sr Hardware Development Engineer, High Performance AI & ML Servers at Amazon Data Services, Inc.. Skills: Hardware development, AI/ML servers, High performance computing. Lead technical solutions. Own end-to-end system reliability”

What You'll Achieve.

Deliver next-generation infrastructure; Provide infinite capacity; Lowest possible cost

Industry & Context.

Technology
Problems you'll solve

Root causing; Troubleshooting; Analytical skills

What They're Looking For.

Must Have

Developing functional specifications, Design verification plans, Functional test procedures, Server technologies expertise, Bachelor's degree or above, 5+ years Design/Innovation, 5+ years R&D, 5+ years manufacturing, 5+ years process engineering, 5+ years industrial engineering, 5+ years process development, English-language communication skills, Thermal design expertise, Mechanical design expertise, High speed bus design expertise, Signal integrity expertise, Failure analysis expertise, Server components knowledge, BIOS knowledge, BMC knowledge, Networking knowledge

Nice to Have

Master's degree or above, Interdisciplinary teams experience, 10+ years server experience, 10+ years storage experience, 10+ years networking experience, 10+ years distributed systems experience, Define product experience, Bring product to market experience, 5+ years data center engineering, 5+ years data center operations, Linux/RHEL experience, Programming/scripting experience, Analytical skills, Attention to detail, Effective communication abilities, Server validation experience, Issue root causing experience, Leading hardware development teams, Leading software development teams

What You'll Do.

Lead technical solutions

Own end-to-end system reliability

Identify deficiencies

Address system-level issues

Decompose server system problems

Apply expertise across hardware

Apply expertise across software

Apply expertise across system design

Apply expertise across operations

Collaborate with hardware teams

Collaborate with software teams

Collaborate with manufacturing teams

Collaborate with supply chain teams

Collaborate with product management teams

Develop diagnostic tools

Implement diagnostic tools

Develop monitoring solutions

Implement monitoring solutions

Debug complex system failures

Interface with customers

Understand project requirements

Facilitate system development

Solve operational challenges

Improve customer experience

Develop improved systems

Work with ODM/JDM design teams

Manufacture product at scale

How You'll Work.

Team & Collaboration

Interdisciplinary teams; Cross-functional teams; Internal customers; External customers; Hardware teams; Software teams; Network engineers; Supply chain specialists; Security experts; Operations managers; Product management teams; Design teams

Communication Scope

Written communication; Verbal communication; English language

Process & Methodology

Technical Program Managers

Full Job Description

Do you want to shape the future of AI? Join the team building the foundation of the world’s most advanced cloud for AI training and inference — where multi-billion-parameter models come to life at scale. Here, you’ll design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI/ML and HPC workloads. If you’re passionate about pushing the limits of performance, efficiency, and scalability in the cloud, this is your opportunity to build the systems that define what’s next for AWS — and for the entire AI industry. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. Key job responsibilities - Lead technical solutions for complex high performance server and/or accelerator server and rack system architectural challenges - Own end-to-end system reliability, proactively identifying and resolving deficiencies before customer impact - Design and implement solutions to address system-level issues at large scale - Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features - Apply expertise across hardware, software, system design, x86 architecture, processes, and operations - Collaborate with hardware, software, manufacturing, supply chain and product management teams - Develop and implement diagnostic tools and monitoring solutions for production systems - Debug complex system failures in time sensitive settings A day in the life Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and faci

Free ATS check

Applying for this Sr Hardware Development Engineer, High Performance AI & ML Servers role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Amazon Data Services, Inc.?

Real rants from real employees. Read before you apply.

Read Company Rants →