Amazon Web Services Development Center Germany GmbH

Technology

PrincipalEngineer-SystemsforMLInferenceandTrainingOptimization

€145–215k ~AI est. Tübingen, Baden-Wurttemberg, Germany FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Principal Engineer - Systems for ML Inference and Training Optimization at Amazon Web Services Development Center Germany GmbH. Skills: ML Systems, Performance Engineering, Distributed Systems, Low-level Optimization. Define technical strategy. Drive architectural roadmap”

What You'll Achieve.

Achieve 10x performance improvements; Deliver order-of-magnitude improvements

Industry & Context.

Technology

Problems you'll solve

Performance analysis; Root cause analysis; Troubleshooting

What They're Looking For.

Must Have

10+ years software development, Expert C/C++ proficiency, Expert low-level systems programming, Proven order-of-magnitude performance improvements, Extensive CUDA programming, Extensive GPU architecture knowledge, Extensive assembly-level optimization, Extensive kernel development, Lead organization-level technical initiatives, Build consensus on technical decisions, Drive architectural strategy, Define technical roadmaps, Conduct performance analysis, Resource budgeting, Translate system analysis into plans

Nice to Have

Master's degree or higher, 15+ years performance engineering, Optimize ML inference workloads, Optimize ML training workloads, Deep expertise multiple hardware architectures, Quickly master new hardware platforms, Develop portable high-performance libraries, Develop high-performance tools, Develop high-performance frameworks, Lead large-scale optimization initiatives, Coordinate performance engineering efforts, Establish deep understanding complex systems, Create performance measurement tools, Create performance analysis tools, Entrepreneurial experience, Startup founding experience, CTO role experience, Drive technical vision in product development

What You'll Do.

Define technical strategy

Drive architectural roadmap

Lead kernel-level optimizations

Architect heterogeneous compute platforms

Architect multi-GPU systems

Architect multi-node training systems

Lead delivery of solutions

Set technical direction

Define standards for CUDA

Optimize assembly-level code

Architect cross-platform acceleration

Design multi-node communication

Invent novel approaches

Achieve 10x performance improvements

Establish standards for optimization

Tackle performance challenges

Address architectural complexity

Solve critical business problems

Solve critical technical problems

Drive design of performance solutions

Drive implementation of performance solutions

Drive delivery of performance solutions

Establish understanding of new SoCs

Establish understanding of new GPUs

Establish understanding of AI accelerators

Derive guidelines for utilization

Influence hardware selection decisions

Set standard for engineering excellence

Create mechanisms for performance measurement

Create tools for performance analysis

Create processes for optimization

Align teams toward strategies

Align teams toward decisions

Drive adoption of optimization approaches

Drive adoption of optimization concepts

Drive adoption of optimization paradigms

Lead technical reviews

Guide career growth of engineers

Mentor senior engineers

Develop performance leaders

Participate in promotion assessments

Grow Principal community

Write critical-path code

Design zero-overhead libraries

Design portable libraries

How You'll Work.

Team & Collaboration

Across multiple teams; Across organizations

Communication Scope

Technical reviews

Process & Methodology

Technical roadmaps, Development plans

Full Job Description

We are seeking an exceptional Principal Engineer specializing in ML Systems, training, and inference optimization to lead our technical strategy and implementation for next-generation AI performance at scale. This role requires deep expertise in performance engineering, distributed systems architecture, low-level systems optimization, and the ability to drive technical excellence across multiple teams. You will set the technical direction for kernel-level optimizations, define architectural strategies for heterogeneous compute platforms, architect multi-GPU and multi-node training systems, and lead the delivery of solutions that fundamentally change how AWS serves ML training and inference workloads. As a Principal Engineer in DS3, you will be a key technical leader responsible for organization-level architecture and performance strategy spanning the entire ML lifecycle—from distributed training of frontier models to high-throughput inference serving. You will work at the lowest levels of the software stack—defining standards for CUDA kernel development, optimizing assembly-level code (e.g. Nvidia PTX code), architect cross-platform acceleration strategies including GPUs and AWS Neuron, designing efficient multi-node communication patterns, and inventing novel approaches to achieve 10× or greater performance improvements. Your work will directly influence AWS's competitive position in AI infrastructure and set the standard for ML systems engineering across the industry. Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, inc

Free ATS check

Applying for this Principal Engineer - Systems for ML Inference and Training Optimization role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 27 detected · ranked by frequency

Performance Engineering ×3

Low-level Optimization ×3

Kernel development ×3

Assembly-level code optimization ×3

Multi-GPU systems design ×3

Heterogeneous compute architecture ×3

Distributed training optimization ×3

High-throughput inference serving ×3

Low-level software optimization ×3

Zero-overhead libraries design ×3

Portable libraries design ×3

ML Systems ×2

Distributed Systems ×2

Nvidia PTX ×2

CUDA

AWS Neuron

AI accelerators

SoCs

GPUs

Systems architecture

Technical strategy

Cross-platform acceleration

Hardware-software co-design

Resource budgeting

Performance measurement

Performance analysis

Performance optimization

BEHAVIOURAL

LeadershipMentoring

Role Details

Experience 5–10 yrs

Level Senior

Work Mode Onsite

Type FULL TIME

Salary Band 100k-150k

AI-Extracted Insights

Domain Areas

ml-inferenceml-trainingdistributed-systemslow-level-systemsai-infrastructurehigh-performance-computingml-lifecyclefrontier-models

ANONYMOUS · UNFILTERED

What do employees actually say about Amazon Web Services Development Center Germany GmbH?

Real rants from real employees. Read before you apply.

Read Company Rants →