Canva

Computer Software

MachineLearningEngineer(TrainingOptimization)

Beijing, Beijing, China FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for mid candidates.

The Brief

“Machine Learning Engineer (Training Optimization) at Canva. Skills: scale and optimize the training system for our large-scale multimodal and foundation models, design distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton, pushing the limits of performance across compute, memory, and communication layers, sit at the intersection of systems and AI research, directly shaping how we train the models that will power Canva’s next generation of products, design, implement, and ”

What You'll Achieve.

scale and optimize the training system; improve all aspects of performance; unlock new levels of scalability; ship research that makes a real impact—from smart editing to AI video tools—at massive scale

Industry & Context.

Computer Software

Problems you'll solve

Excellent problem-solving skills

What They're Looking For.

Must Have

systems-first engineer, deeply familiar with distributed model training at scale, understand the nuances of optimizing compute at every level of the stack, excited by challenges that stretch current boundaries, collaborator who communicates clearly across domains, background in LLMs, multimodal AI, or diffusion models, Proficiency in Python, Deep knowledge of PyTorch or JAX, Deep knowledge of libraries such as Megatron-LM, NeMo, or DeepSpeed, Familiarity with common optimization techniques such as FSDP/ZeRO, gradient checkpointing, or low-precision data types, Hands-on experience writing custom GPU kernels in CUDA or Triton, Excellent communication and problem-solving skills, full proficiency in English

Nice to Have

Familiarity with a system programming language (e. g. C++ or Rust)

What You'll Do.

and optimize large-scale machine learning systems for training

improve all aspects of performance

including GPU utilization

communication overhead

and memory efficiency

partner with research and modeling teams to align systems with algorithmic needs

evaluate and apply best practices for distributed training using industry-leading frameworks

dive deep into low-level optimization

including custom CUDA or Triton kernels

and fine-tune training workflows to unlock new levels of scalability

How You'll Work.

Team & Collaboration

partner with research and modeling teams; collaborate globally; communicates clearly across domains

Communication Scope

Excellent communication; full proficiency in English

Full Job Description

该岗位现面向所有经验阶段的候选人开放，包括社会招聘、应届毕业生，同时开放实习生岗位。工作地点为北京。欢迎申请，期待你的加入！ Notice: This position is open to candidates at all experience levels, including experienced candidates, 2025 and 2026 graduates, as well as internship opportunities. The role is based in Beijing. We welcome your application and look forward to having you on board! At Canva, we're building a future powered by AI that's as magical as it is impactful. As a Research Scientist at Canva, you'll be responsible for advancing the future of AI by experimenting with cutting-edge techniques, as well as improving models for real-world quality and performance. About the Group/Team We're the CORE team within the Generative AI supergroup. Our mission is to invent foundational technologies that will power the future of AI-assisted design. From large-scale models to groundbreaking research, our team builds the technical core of Canva’s creative intelligence engine. We collaborate globally to ship research that makes a real impact—from smart editing to AI video tools—at massive scale. Job Description About the Role/Specialty As a Machine Learning Engineer, you’ll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. You’ll design distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton—pushing the limits of performance across compute, memory, and communication layers. You'll sit at the intersection of systems and AI research, directly shaping how we train the models that will power Canva’s next generation of products. What you’ll do (responsibilities) * You’ll design, implement, and optimize large-scale machine learning systems for training * You’ll improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency. * You’ll partner with research and modeling teams to align systems with algorithmic needs. * You’ll evaluate and apply best practices for distributed training using industry-leading frameworks.

Free ATS check

Applying for this Machine Learning Engineer (Training Optimization) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 39 detected · ranked by frequency

Triton ×6

PyTorch ×5

JAX ×5

Megatron-LM ×5

DeepSpeed ×5

CUDA ×5

Python ×4

Rust ×4

large-scale machine learning systems for training ×3

distributed training systems ×3

low-level optimization ×3

custom CUDA or Triton kernels ×3

debugging ×3

fine-tuning training workflows ×3

distributed model training at scale ×3

optimizing compute ×3

LLMs ×3

multimodal AI ×3

diffusion models ×3

NeMo ×3

FSDP/ZeRO ×3

gradient checkpointing ×3

low-precision data types ×3

scale and optimize the training system for our large-scale multimodal and foundation models ×2

design distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton ×2

pushing the limits of performance across compute, memory, and communication layers ×2

sit at the intersection of systems and AI research ×2

directly shaping how we train the models that will power Canva’s next generation of products ×2

design, implement, and optimize large-scale machine learning systems for training ×2

improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency ×2

evaluate and apply best practices for distributed training using industry-leading frameworks ×2

dive deep into low-level optimization, including custom CUDA or Triton kernels ×2

BEHAVIOURAL

collaboratorcommunicates clearly across domains

Role Details

Experience 5–10 yrs

Level mid

Type FULL TIME

Category Design Generation CN

AI-Extracted Insights

Domain Areas

llmsmultimodal-aidiffusion-modelslarge-scale-multimodal-and-foundation-models

How to Apply on SmartRecruiters

SmartRecruiters often includes a video screening step — check camera and mic permissions.
Link your GitHub or portfolio directly in the profile section for technical roles.
Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.

ANONYMOUS · UNFILTERED

What do employees actually say about Canva?

Real rants from real employees. Read before you apply.

Read Company Rants →