Nebius

AI economy

MLInfrastructureEngineer

Prague, Czech Republic
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“ML Infrastructure Engineer at Nebius. Skills: ML Infrastructure, GPU performance analysis, Deep learning frameworks, Performance optimization. Lead and support benchmarking of GPU platforms for machine learning and AI workloads. Evaluate and compare GPU performance across different platforms, architectures, and software stacks”

What You'll Achieve.

Enable data-driven decisions for platform optimisation and next-generation hardware development

Industry & Context.

AI economy
Problems you'll solve

Identifying and resolving performance bottlenecks

What They're Looking For.

Must Have

Profound understanding of theoretical foundations of machine learning, Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.), Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensort-LLM), Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries, Familiarity with containerized environments (e.g., Docker, Kubernetes), Communication and ability to work independently

Nice to Have

Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT), Experience in Python and performance profiling tools (e.g., Nsight, nvprof, perf), Familiarity with cloud ML platforms like AWS, GCP, Azure ML, Contributions to open-source ML benchmarking tools

What You'll Do.

Lead and support benchmarking of GPU platforms for machine learning and AI workloads

Evaluate and compare GPU performance across different platforms

Debug and optimise ML workloads to run efficiently on GPU hardware

Perform acceptance testing for new GPU clusters

Perform experiments across diverse GPU system configurations

Develop tools and dashboards to visualise performance metrics

Contribute to internal tooling

How You'll Work.

Team & Collaboration

Work closely with hardware, development teams

Communication Scope

Communication

Full Job Description

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R& D. The role We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development. Your responsibilities will include: Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level. Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g.,CUDA, ROCm). Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks. Perform acceptance testing acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads. Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability. Develop tools and dashboards to visualise performance metrics visualise performa

Free ATS check

Applying for this ML Infrastructure Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Nebius?

Real rants from real employees. Read before you apply.

Read Company Rants →