Amazon Data Services, Inc.

Systems, Quality, Security Engineering, Cloud Hardware Development, Cloud Computing

CloudHardwareDevelopmentEngineer,CloudAI/ML/storageserverteams

$157–213k Cupertino, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams at Amazon Data Services, Inc.. Skills: Cloud hardware development, Server platforms, AI/ML/storage servers, NPI lifecycle, Fleet health. Own end-to-end NPI lifecycle. Lead technical solutions”

What You'll Achieve.

Meet performance, reliability, and cost targets; Drive toward zero-touch operations; Resolve issues before customer impact; Improve current customer experience; Develop improved systems for future designs

Industry & Context.

Systems, Quality, Security Engineering, Cloud Hardware Development, Cloud Computing
Problems you'll solve

Root cause analysis; Troubleshooting; Problem decomposition

What They're Looking For.

Must Have

Bachelor's degree or above in electrical engineering, computer engineering, or equivalent, Experience in English-language communication skills, both written and verbal, Experience with design & innovation and research & development, Knowledge of operating systems, hardware, storage, network, security, database administration and cloud infrastructure, Experience in server technologies such as, thermal, mechanical, power, and SDs, memory), BIOS, BMC, and networking, Experience developing and executing test procedures for mechanical or electrical systems/components, Experience working with ODMs/manufacturer through the product development and manufacturing lifecycle, Experience building predictive failure detection or proactive remediation systems at fleet scale, Experience with storage/compute/GPU/accelerator platforms including integration, diagnostics, or performance validation, Familiarity with PCIe topology, NVLink, NVMe, and accelerator interconnects, Experience with large-scale datacenter or cloud environments, Experience in developing functional specifications, design verification plans and functional test procedures

Nice to Have

Master's degree or above in electrical engineering, computer engineering, or equivalent

What You'll Do.

Own end-to-end NPI lifecycle

Lead technical solutions

Work with ODM/manufacturing partners

Develop functional specifications

Drive qualification and readiness milestones

Identify and resolve technical risks

Design and implement predictive failure detection systems

Drive toward zero-touch operations

Debug complex system failures

Perform root cause analysis

Apply expertise across hardware

Design and implement solutions

Decompose complex server system problems

Collaborate with hardware

Work closely with internal customers

Identify potential problems onboarding new servers

Collaborate across Hardware Engineering

Partner with datacenter operations

How You'll Work.

Team & Collaboration

Interdisciplinary team; Internal customers; ODM partners; Hardware Engineering teams; Component teams; Firmware teams; Test teams; Qualification teams; Integration teams; Datacenter operations

Communication Scope

Written communication; Verbal communication

Process & Methodology

New Product Introduction (NPI)

Full Job Description

As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms — from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet. You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet — monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their operational life. This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs — whether during NPI qualification or in a production fleet of hundreds of thousands of servers — you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out. You will own end-to-end system reliability — proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel. This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards

Free ATS check

Applying for this Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Amazon Data Services, Inc.?

Real rants from real employees. Read before you apply.

Read Company Rants →