Amazon Development Center U.S., Inc.

Technology

SrHardwareDevelopmentEngineer,HighPerformanceAI&MLServers

$159–215k Austin, Texas, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Sr Hardware Development Engineer, High Performance AI & ML Servers at Amazon Development Center U.S., Inc.. Skills: Hardware Development, AI/ML Servers, High Performance Computing. Lead technical solutions for complex high performance server. Own end-to-end system reliability”

What You'll Achieve.

Deliver next-generation infrastructure; Provide infinite capacity at lowest possible cost

Industry & Context.

Technology
Problems you'll solve

Root cause analysis; Troubleshooting; Issue root causing

What They're Looking For.

Must Have

Developing functional specifications, Design verification plans, Functional test procedures, Server technologies, Thermal design, Mechanical design, Power design, Signal integrity design, Bachelor's degree or above in electrical engineering, computer engineering, or equivalent, 5+ years of Design/Innovation, research & development, manufacturing, process, industrial engineering, or related experience, 5+ years of process development experience, English-language communication skills, both written and verbal, Expertise in server technologies, CPU, GPU, SSDs, memory, BIOS, BMC, networking

Nice to Have

Master's degree or above in electrical engineering, computer engineering, or equivalent, Experience working with interdisciplinary teams, Execute product design from concept to production, 10+ years of server, storage, networking, or large-scale distributed systems experience, Experience working with engineering and product teams, Define a product and bring it to market, 5+ years of data center engineering or operations experience, Experience in Linux/RHEL, Experience with programming/scripting, Analytical skills, Attention to detail, Effective communication abilities, Server validation experience, Issue root causing experience, Leading hardware and software development engineering teams

What You'll Do.

Lead technical solutions for complex high performance server

Own end-to-end system reliability

Proactively identifying and resolving deficiencies before customer impact

Design and implement solutions to address system-level issues

Decompose complex server system problems

Apply expertise across hardware

Collaborate with hardware

Develop and implement diagnostic tools

Develop and implement monitoring solutions for production systems

Debug complex system failures in time sensitive settings

Interface with internal and external customers

Understand project requirements

Facilitate system development on top of your server

Solve operational challenges to existing fleet

Improve current customer experience

Develop improved systems for future designs

Work directly with vendors

Work with ODM/JDM design teams

Manufacture your product at scale

How You'll Work.

Team & Collaboration

Hardware design engineers; System design engineers; Technical program managers; Software engineers; Network engineers; Supply chain specialists; Security experts; Operations managers; Interdisciplinary teams; Engineering teams; Product teams

Communication Scope

Written communication; Verbal communication

Process & Methodology

Product design, Product development

Full Job Description

Do you want to shape the future of AI? Join the team building the foundation of the world’s most advanced cloud for AI training and inference — where multi-billion-parameter models come to life at scale. Here, you’ll design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI/ML and HPC workloads. If you’re passionate about pushing the limits of performance, efficiency, and scalability in the cloud, this is your opportunity to build the systems that define what’s next for AWS — and for the entire AI industry. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. Key job responsibilities - Lead technical solutions for complex high performance server and/or accelerator server and rack system architectural challenges - Own end-to-end system reliability, proactively identifying and resolving deficiencies before customer impact - Design and implement solutions to address system-level issues at large scale - Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features - Apply expertise across hardware, software, system design, x86 architecture, processes, and operations - Collaborate with hardware, software, manufacturing, supply chain and product management teams - Develop and implement diagnostic tools and monitoring solutions for production systems - Debug complex system failures in time sensitive settings A day in the life Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and faci

Free ATS check

Applying for this Sr Hardware Development Engineer, High Performance AI & ML Servers role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Amazon Development Center U.S., Inc.?

Real rants from real employees. Read before you apply.

Read Company Rants →