Amazon Web Services Development Center Germany GmbH
Technology
PrincipalEngineer-SystemsforMLInferenceandTrainingOptimization
Neural analysis suggests this role is
optimal for Senior candidates.
“Principal Engineer - Systems for ML Inference and Training Optimization at Amazon Web Services Development Center Germany GmbH. Skills: ML Systems, Performance Engineering, Distributed Systems, Low-level Optimization. Define technical strategy. Drive architectural roadmap”
What You'll Achieve.
Achieve 10x performance improvements; Deliver order-of-magnitude improvements
Industry & Context.
Performance analysis; Root cause analysis; Troubleshooting
What They're Looking For.
Must Have
10+ years software development, Expert C/C++ proficiency, Expert low-level systems programming, Proven order-of-magnitude performance improvements, Extensive CUDA programming, Extensive GPU architecture knowledge, Extensive assembly-level optimization, Extensive kernel development, Lead organization-level technical initiatives, Build consensus on technical decisions, Drive architectural strategy, Define technical roadmaps, Conduct performance analysis, Resource budgeting, Translate system analysis into plans
Nice to Have
Master's degree or higher, 15+ years performance engineering, Optimize ML inference workloads, Optimize ML training workloads, Deep expertise multiple hardware architectures, Quickly master new hardware platforms, Develop portable high-performance libraries, Develop high-performance tools, Develop high-performance frameworks, Lead large-scale optimization initiatives, Coordinate performance engineering efforts, Establish deep understanding complex systems, Create performance measurement tools, Create performance analysis tools, Entrepreneurial experience, Startup founding experience, CTO role experience, Drive technical vision in product development
What You'll Do.
Define technical strategy
Drive architectural roadmap
Lead kernel-level optimizations
Architect heterogeneous compute platforms
Architect multi-GPU systems
Architect multi-node training systems
Lead delivery of solutions
Set technical direction
Define standards for CUDA
Optimize assembly-level code
Architect cross-platform acceleration
Design multi-node communication
Invent novel approaches
Achieve 10x performance improvements
Establish standards for optimization
Tackle performance challenges
Address architectural complexity
Solve critical business problems
Solve critical technical problems
Drive design of performance solutions
Drive implementation of performance solutions
Drive delivery of performance solutions
Establish understanding of new SoCs
Establish understanding of new GPUs
Establish understanding of AI accelerators
Derive guidelines for utilization
Influence hardware selection decisions
Set standard for engineering excellence
Create mechanisms for performance measurement
Create tools for performance analysis
Create processes for optimization
Align teams toward strategies
Align teams toward decisions
Drive adoption of optimization approaches
Drive adoption of optimization concepts
Drive adoption of optimization paradigms
Lead technical reviews
Guide career growth of engineers
Mentor senior engineers
Develop performance leaders
Participate in promotion assessments
Grow Principal community
Write critical-path code
Design zero-overhead libraries
Design portable libraries
How You'll Work.
Team & Collaboration
Across multiple teams; Across organizations
Communication Scope
Technical reviews
Process & Methodology
Technical roadmaps, Development plans
Full Job Description
We are seeking an exceptional Principal Engineer specializing in ML Systems, training, and inference optimization to lead our technical strategy and implementation for next-generation AI performance at scale. This role requires deep expertise in performance engineering, distributed systems architecture, low-level systems optimization, and the ability to drive technical excellence across multiple teams. You will set the technical direction for kernel-level optimizations, define architectural strategies for heterogeneous compute platforms, architect multi-GPU and multi-node training systems, and lead the delivery of solutions that fundamentally change how AWS serves ML training and inference workloads. As a Principal Engineer in DS3, you will be a key technical leader responsible for organization-level architecture and performance strategy spanning the entire ML lifecycle—from distributed training of frontier models to high-throughput inference serving. You will work at the lowest levels of the software stack—defining standards for CUDA kernel development, optimizing assembly-level code (e.g. Nvidia PTX code), architect cross-platform acceleration strategies including GPUs and AWS Neuron, designing efficient multi-node communication patterns, and inventing novel approaches to achieve 10× or greater performance improvements. Your work will directly influence AWS's competitive position in AI infrastructure and set the standard for ML systems engineering across the industry. Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, inc
Applying for this Principal Engineer - Systems for ML Inference and Training Optimization role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon Web Services Development Center Germany GmbH?
Real rants from real employees. Read before you apply.