Company

Technology

SeniorMLEngineer(TokenFactory)

£95–140k ~AI est. United Kingdom FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior ML Engineer (Token Factory). Skills: ML Engineering, LLM Optimization, GPU Performance, Distributed Systems. Drive inference optimization efforts. identify bottlenecks”

What You'll Achieve.

Improve throughput; Reduce latency; Reduce cost per token

Industry & Context.

Technology

Problems you'll solve

Identify bottlenecks; Identify performance constraints

What They're Looking For.

Must Have

Understanding of machine learning fundamentals, Transformer architectures, Large language models, Hands-on experience profiling GPU workloads, Hands-on experience optimizing GPU workloads, Deep knowledge of GPU architecture, Memory hierarchy trade-offs, Compute vs. memory trade-offs, Familiarity with key LLM concepts, Experience with large-scale deep learning training, Distributed systems, Sharding strategies, Custom kernel development, Advanced proficiency in Python, Modern ML frameworks, Solid understanding of software engineering practices, Version control, CI/CD pipelines, Unit testing, Communication skills

Nice to Have

Speculative decoding techniques, KV-cache optimization, Support for dense models, Support for MoE models, Low-precision training pipelines, Low-precision inference pipelines, FP8 training, FP8 inference, MXFP4 training, MXFP4 inference, Flash Attention, Quantization techniques

What You'll Do.

Drive inference optimization efforts

implement performance improvements

reduce cost per token

Contribute to design of inference engines

Contribute to evolution of inference engines

Develop low-precision training pipelines

Develop low-precision inference pipelines

productionize low-precision training pipelines

productionize low-precision inference pipelines

Profile GPU workloads

analyze GPU workloads

identify performance constraints

guide architectural improvements

Collaborate on scalable distributed training systems

Collaborate on scalable distributed inference systems

Contribute to engineering best practices

maintainable production-grade ML systems

How You'll Work.

Team & Collaboration

Highly technical teams; Cross-functional teams

Communication Scope

Technical communication

Full Job Description

## Accountabilities Drive inference optimization efforts by identifying bottlenecks and implementing performance improvements across diverse LLM architectures, improving throughput and reducing latency and cost per token. Contribute to the design and evolution of inference engines, including techniques such as speculative decoding, KV-cache optimization, and support for dense and MoE models. Develop and productionize low-precision training and inference pipelines (e.g., FP8, MXFP4) to maximize efficiency on large GPU clusters. Profile and analyze GPU workloads using modern tooling to identify performance constraints and guide architectural improvements. Collaborate on scalable distributed training and inference systems, including sharding strategies, custom kernels, and hardware-aware optimizations. Contribute to engineering best practices including testing, CI/CD, and maintainable production-grade ML systems. Requirements: Strong understanding of machine learning fundamentals, particularly transformer architectures and large language models. Hands-on experience profiling and optimizing GPU workloads using tools such as Nsight or PyTorch Profiler. Deep knowledge of GPU architecture, including memory hierarchy and compute vs. memory trade-offs. Familiarity with key LLM concepts such as attention mechanisms, RoPE, KV-cache, Flash Attention, and quantization techniques. Experience with large-scale deep learning training, including distributed systems, sharding strategies, and custom kernel development. Strong software engineering skills, with advanced proficiency in Python and modern ML frameworks. Solid understanding of software engineering practices such as version control, CI/CD pipelines, and unit testing. Strong communication skills with the ability to collaborate effectively in highly technical, cross-functional teams. Benefits: Competitive compensation package Strong career development and continuous learning opportunities Flexible work environment with high aut

Free ATS check

Applying for this Senior ML Engineer (Token Factory) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 34 detected · ranked by frequency

Distributed Systems ×3

Transformer architectures ×3

Large language models ×3

GPU architecture ×3

Memory hierarchy ×3

Compute trade-offs ×3

Memory trade-offs ×3

Attention mechanisms ×3

RoPE ×3

KV-cache ×3

Flash Attention ×3

Quantization techniques ×3

Deep learning training ×3

Sharding strategies ×3

Custom kernel development ×3

Software engineering ×3

Version control ×3

CI/CD pipelines ×3

Unit testing ×3

ML Engineering ×2

LLM Optimization ×2

GPU Performance ×2

Python

ML frameworks

LLM architectures

GPU workloads

Custom kernels

Inference optimization

Performance improvements

Engineering best practices

Scalable distributed training

Scalable distributed inference

Role Details

Seniority Senior

Experience 5–10 yrs

Level Senior

Work Mode Flexible

Type FULL TIME

Category software

Salary Band 75k-100k

AI-Extracted Insights

Domain Areas

llm-architecturesgpu-architecturedistributed-systemsai-systems

How to Apply on Lever

Lever uses a streamlined one-page form — apply in under 5 minutes.
LinkedIn import works well; review parsed data before submitting.
The cover letter field is optional but visible to reviewers — use it to differentiate.
Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →