Amazon Development Center U.S., Inc.
Technology
SrHardwareDevelopmentEngineer,HighPerformanceAI&MLServers
Neural analysis suggests this role is
optimal for Senior candidates.
“Sr Hardware Development Engineer, High Performance AI & ML Servers at Amazon Development Center U.S., Inc.. Skills: Hardware Development, AI/ML Servers, High Performance Computing. Lead technical solutions for complex high performance server. Own end-to-end system reliability”
What You'll Achieve.
Deliver next-generation infrastructure; Provide infinite capacity at lowest possible cost
Industry & Context.
Root cause analysis; Troubleshooting; Issue root causing
What They're Looking For.
Must Have
Developing functional specifications, Design verification plans, Functional test procedures, Server technologies, Thermal design, Mechanical design, Power design, Signal integrity design, Bachelor's degree or above in electrical engineering, computer engineering, or equivalent, 5+ years of Design/Innovation, research & development, manufacturing, process, industrial engineering, or related experience, 5+ years of process development experience, English-language communication skills, both written and verbal, Expertise in server technologies, CPU, GPU, SSDs, memory, BIOS, BMC, networking
Nice to Have
Master's degree or above in electrical engineering, computer engineering, or equivalent, Experience working with interdisciplinary teams, Execute product design from concept to production, 10+ years of server, storage, networking, or large-scale distributed systems experience, Experience working with engineering and product teams, Define a product and bring it to market, 5+ years of data center engineering or operations experience, Experience in Linux/RHEL, Experience with programming/scripting, Analytical skills, Attention to detail, Effective communication abilities, Server validation experience, Issue root causing experience, Leading hardware and software development engineering teams
What You'll Do.
Lead technical solutions for complex high performance server
Own end-to-end system reliability
Proactively identifying and resolving deficiencies before customer impact
Design and implement solutions to address system-level issues
Decompose complex server system problems
Apply expertise across hardware
Collaborate with hardware
Develop and implement diagnostic tools
Develop and implement monitoring solutions for production systems
Debug complex system failures in time sensitive settings
Interface with internal and external customers
Understand project requirements
Facilitate system development on top of your server
Solve operational challenges to existing fleet
Improve current customer experience
Develop improved systems for future designs
Work directly with vendors
Work with ODM/JDM design teams
Manufacture your product at scale
How You'll Work.
Team & Collaboration
Hardware design engineers; System design engineers; Technical program managers; Software engineers; Network engineers; Supply chain specialists; Security experts; Operations managers; Interdisciplinary teams; Engineering teams; Product teams
Communication Scope
Written communication; Verbal communication
Process & Methodology
Product design, Product development
Full Job Description
Do you want to shape the future of AI? Join the team building the foundation of the world’s most advanced cloud for AI training and inference — where multi-billion-parameter models come to life at scale. Here, you’ll design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI/ML and HPC workloads. If you’re passionate about pushing the limits of performance, efficiency, and scalability in the cloud, this is your opportunity to build the systems that define what’s next for AWS — and for the entire AI industry. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. Key job responsibilities - Lead technical solutions for complex high performance server and/or accelerator server and rack system architectural challenges - Own end-to-end system reliability, proactively identifying and resolving deficiencies before customer impact - Design and implement solutions to address system-level issues at large scale - Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features - Apply expertise across hardware, software, system design, x86 architecture, processes, and operations - Collaborate with hardware, software, manufacturing, supply chain and product management teams - Develop and implement diagnostic tools and monitoring solutions for production systems - Debug complex system failures in time sensitive settings A day in the life Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and faci
Applying for this Sr Hardware Development Engineer, High Performance AI & ML Servers role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon Development Center U.S., Inc.?
Real rants from real employees. Read before you apply.