Amazon Data Services, Inc.
Technology
SrHardwareDevelopmentEngineer,HighPerformanceAI&MLServers
Neural analysis suggests this role is
optimal for Senior candidates.
“Sr Hardware Development Engineer, High Performance AI & ML Servers at Amazon Data Services, Inc.. Skills: Hardware development, AI/ML servers, High performance computing. Lead technical solutions. Own end-to-end system reliability”
What You'll Achieve.
Deliver next-generation infrastructure; Provide infinite capacity; Lowest possible cost
Industry & Context.
Root causing; Troubleshooting; Analytical skills
What They're Looking For.
Must Have
Developing functional specifications, Design verification plans, Functional test procedures, Server technologies expertise, Bachelor's degree or above, 5+ years Design/Innovation, 5+ years R&D, 5+ years manufacturing, 5+ years process engineering, 5+ years industrial engineering, 5+ years process development, English-language communication skills, Thermal design expertise, Mechanical design expertise, High speed bus design expertise, Signal integrity expertise, Failure analysis expertise, Server components knowledge, BIOS knowledge, BMC knowledge, Networking knowledge
Nice to Have
Master's degree or above, Interdisciplinary teams experience, 10+ years server experience, 10+ years storage experience, 10+ years networking experience, 10+ years distributed systems experience, Define product experience, Bring product to market experience, 5+ years data center engineering, 5+ years data center operations, Linux/RHEL experience, Programming/scripting experience, Analytical skills, Attention to detail, Effective communication abilities, Server validation experience, Issue root causing experience, Leading hardware development teams, Leading software development teams
What You'll Do.
Lead technical solutions
Own end-to-end system reliability
Identify deficiencies
Address system-level issues
Decompose server system problems
Apply expertise across hardware
Apply expertise across software
Apply expertise across system design
Apply expertise across operations
Collaborate with hardware teams
Collaborate with software teams
Collaborate with manufacturing teams
Collaborate with supply chain teams
Collaborate with product management teams
Develop diagnostic tools
Implement diagnostic tools
Develop monitoring solutions
Implement monitoring solutions
Debug complex system failures
Interface with customers
Understand project requirements
Facilitate system development
Solve operational challenges
Improve customer experience
Develop improved systems
Work with ODM/JDM design teams
Manufacture product at scale
How You'll Work.
Team & Collaboration
Interdisciplinary teams; Cross-functional teams; Internal customers; External customers; Hardware teams; Software teams; Network engineers; Supply chain specialists; Security experts; Operations managers; Product management teams; Design teams
Communication Scope
Written communication; Verbal communication; English language
Process & Methodology
Technical Program Managers
Full Job Description
Do you want to shape the future of AI? Join the team building the foundation of the world’s most advanced cloud for AI training and inference — where multi-billion-parameter models come to life at scale. Here, you’ll design, deliver, and operate next-generation infrastructure that powers breakthrough innovation in AI/ML and HPC workloads. If you’re passionate about pushing the limits of performance, efficiency, and scalability in the cloud, this is your opportunity to build the systems that define what’s next for AWS — and for the entire AI industry. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. Key job responsibilities - Lead technical solutions for complex high performance server and/or accelerator server and rack system architectural challenges - Own end-to-end system reliability, proactively identifying and resolving deficiencies before customer impact - Design and implement solutions to address system-level issues at large scale - Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features - Apply expertise across hardware, software, system design, x86 architecture, processes, and operations - Collaborate with hardware, software, manufacturing, supply chain and product management teams - Develop and implement diagnostic tools and monitoring solutions for production systems - Debug complex system failures in time sensitive settings A day in the life Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and faci
Applying for this Sr Hardware Development Engineer, High Performance AI & ML Servers role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon Data Services, Inc.?
Real rants from real employees. Read before you apply.