Amazon Data Services, Inc.

Hardware Development, Cloud Hardware Development, Cloud Computing

CloudHardwareDevelopmentEngineer,CloudAI/ML/storageserverteams

$136–184k Austin, Texas, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams at Amazon Data Services, Inc.. Skills: Cloud hardware development, Server platforms, AI/ML/storage servers, Fleet health, Diagnostics, Automation. Own end-to-end NPI lifecycle. Define architecture”

What You'll Achieve.

Drive toward zero-touch operations; Meet performance targets; Meet reliability targets; Meet cost targets; Improve product performance; Improve product quality; Improve product cost

Industry & Context.

Hardware Development, Cloud Hardware Development, Cloud Computing
Problems you'll solve

Root cause analysis; Debugging; Troubleshooting; System failure analysis

What They're Looking For.

Must Have

Experience in developing functional specifications, Experience in design verification plans, Experience in functional test procedures, Bachelor's degree or above in electrical engineering, Bachelor's degree or above in computer engineering, English-language communication skills, both written and verbal, Experience in design & innovation, Experience in research & development, Knowledge of operating systems, Knowledge of hardware, Knowledge of storage, Knowledge of network, Knowledge of security, Knowledge of database administration, Knowledge of cloud infrastructure, Experience in server technologies, Experience with thermal, Experience with mechanical, Experience with power, Experience with signal ping, Experience executing test procedures for mechanical systems, Experience executing test procedures for electrical systems, Experience working with ODMs, Experience working with manufacturer through product development, Experience working with manufacturer through manufacturing lifecycle, Experience building predictive failure detection systems, Experience building proactive remediation systems at fleet scale, Experience with storage platforms, Experience with compute platforms, Experience with GPU platforms, Experience with accelerator platforms, Experience with integration, Experience with diagnostics, Experience with performance validation, Familiarity with PCIe topology, Familiarity with NVLink, Familiarity with NVMe, Familiarity with accelerator interconnects, Experience with large-scale datacenter environments, Experience with cloud environments

Nice to Have

Master's degree or above in electrical engineering, Master's degree or above in computer engineering, Experience building predictive failure detection systems at fleet scale, Experience building proactive remediation systems at fleet scale

What You'll Do.

Own end-to-end NPI lifecycle

Design server platforms

Qualify server platforms

Manufacture server platforms

Launch server platforms

Lead technical solutions

Architect server systems

Work with ODM partners

Develop server products

Validate server products

Manufacture server products

Develop functional specifications

Develop design verification plans

Develop test procedures

Drive qualification milestones

Drive readiness milestones

Meet performance targets

Meet reliability targets

Identify technical risks

Resolve technical risks

Design predictive failure detection systems

Identify hardware issues

Drive zero-touch operations

Build detection systems

Build diagnosis systems

Build remediation systems

Debug complex system failures

Dive deep into failures

Perform root cause analysis

Correlate across layers

Apply expertise across hardware

Apply expertise across software

Apply expertise across system design

Apply expertise across x86 architecture

Apply expertise across processes

Apply expertise across operations

Design solutions for system-level issues

Implement solutions for system-level issues

Decompose server system problems

Lead feature delivery

Collaborate with hardware teams

Collaborate with software teams

Collaborate with manufacturing teams

Collaborate with supply chain teams

Collaborate with product management teams

Work with internal customers

Ensure server hardware meets requirements

Identify potential problems onboarding servers

Collaborate across Hardware Engineering

Collaborate with component teams

Collaborate with firmware teams

Collaborate with test teams

Collaborate with qualification teams

Collaborate with integration teams

Partner with datacenter operations

Close loop between field failures and design improvements

Interface with internal customers

Interface with external customers

Understand product requirements

Facilitate system development

Learn operational challenges

Improve customer experience

Develop improved systems

Work with ODM partners

Review platform designs

Improve product performance

Improve product quality

How You'll Work.

Team & Collaboration

Interdisciplinary team; ODM partners; Internal customers; Hardware teams; Software teams; Manufacturing teams; Supply chain teams; Product management teams; Component teams; Firmware teams; Test teams; Qualification teams; Integration teams; Datacenter operations

Communication Scope

Written communication; Verbal communication

Process & Methodology

NPI lifecycle, Product development, Manufacturing lifecycle, Roadmap planning

Full Job Description

As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms — from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet. You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet — monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their operational life. This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs — whether during NPI qualification or in a production fleet of hundreds of thousands of servers — you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out. You will own end-to-end system reliability — proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel. This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards

Free ATS check

Applying for this Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Amazon Data Services, Inc.?

Real rants from real employees. Read before you apply.

Read Company Rants →