Amazon Data Services, Inc.
Hardware Development, Cloud Hardware Development, Cloud Computing
CloudHardwareDevelopmentEngineer,CloudAI/ML/storageserverteams
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams at Amazon Data Services, Inc.. Skills: Cloud hardware development, Server platforms, AI/ML/storage servers, Fleet health, Diagnostics, Automation. Own end-to-end NPI lifecycle. Define architecture”
What You'll Achieve.
Drive toward zero-touch operations; Meet performance targets; Meet reliability targets; Meet cost targets; Improve product performance; Improve product quality; Improve product cost
Industry & Context.
Root cause analysis; Debugging; Troubleshooting; System failure analysis
What They're Looking For.
Must Have
Experience in developing functional specifications, Experience in design verification plans, Experience in functional test procedures, Bachelor's degree or above in electrical engineering, Bachelor's degree or above in computer engineering, English-language communication skills, both written and verbal, Experience in design & innovation, Experience in research & development, Knowledge of operating systems, Knowledge of hardware, Knowledge of storage, Knowledge of network, Knowledge of security, Knowledge of database administration, Knowledge of cloud infrastructure, Experience in server technologies, Experience with thermal, Experience with mechanical, Experience with power, Experience with signal ping, Experience executing test procedures for mechanical systems, Experience executing test procedures for electrical systems, Experience working with ODMs, Experience working with manufacturer through product development, Experience working with manufacturer through manufacturing lifecycle, Experience building predictive failure detection systems, Experience building proactive remediation systems at fleet scale, Experience with storage platforms, Experience with compute platforms, Experience with GPU platforms, Experience with accelerator platforms, Experience with integration, Experience with diagnostics, Experience with performance validation, Familiarity with PCIe topology, Familiarity with NVLink, Familiarity with NVMe, Familiarity with accelerator interconnects, Experience with large-scale datacenter environments, Experience with cloud environments
Nice to Have
Master's degree or above in electrical engineering, Master's degree or above in computer engineering, Experience building predictive failure detection systems at fleet scale, Experience building proactive remediation systems at fleet scale
What You'll Do.
Own end-to-end NPI lifecycle
Design server platforms
Qualify server platforms
Manufacture server platforms
Launch server platforms
Lead technical solutions
Architect server systems
Work with ODM partners
Develop server products
Validate server products
Manufacture server products
Develop functional specifications
Develop design verification plans
Develop test procedures
Drive qualification milestones
Drive readiness milestones
Meet performance targets
Meet reliability targets
Identify technical risks
Resolve technical risks
Design predictive failure detection systems
Identify hardware issues
Drive zero-touch operations
Build detection systems
Build diagnosis systems
Build remediation systems
Debug complex system failures
Dive deep into failures
Perform root cause analysis
Correlate across layers
Apply expertise across hardware
Apply expertise across software
Apply expertise across system design
Apply expertise across x86 architecture
Apply expertise across processes
Apply expertise across operations
Design solutions for system-level issues
Implement solutions for system-level issues
Decompose server system problems
Lead feature delivery
Collaborate with hardware teams
Collaborate with software teams
Collaborate with manufacturing teams
Collaborate with supply chain teams
Collaborate with product management teams
Work with internal customers
Ensure server hardware meets requirements
Identify potential problems onboarding servers
Collaborate across Hardware Engineering
Collaborate with component teams
Collaborate with firmware teams
Collaborate with test teams
Collaborate with qualification teams
Collaborate with integration teams
Partner with datacenter operations
Close loop between field failures and design improvements
Interface with internal customers
Interface with external customers
Understand product requirements
Facilitate system development
Learn operational challenges
Improve customer experience
Develop improved systems
Work with ODM partners
Review platform designs
Improve product performance
Improve product quality
How You'll Work.
Team & Collaboration
Interdisciplinary team; ODM partners; Internal customers; Hardware teams; Software teams; Manufacturing teams; Supply chain teams; Product management teams; Component teams; Firmware teams; Test teams; Qualification teams; Integration teams; Datacenter operations
Communication Scope
Written communication; Verbal communication
Process & Methodology
NPI lifecycle, Product development, Manufacturing lifecycle, Roadmap planning
Full Job Description
As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms — from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet. You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet — monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their operational life. This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs — whether during NPI qualification or in a production fleet of hundreds of thousands of servers — you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out. You will own end-to-end system reliability — proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel. This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards
Applying for this Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon Data Services, Inc.?
Real rants from real employees. Read before you apply.