Cast AI

Cloud-native and AI infrastructure

SeniorMLEngineer-Kimchi(LLMInferenceOptimization)

European Union Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior ML Engineer - Kimchi (LLM Inference Optimization) at Cast AI. Skills: LLM Inference Optimization, vLLM, SGLang, TensorRT-LLM. Push throughput. Cut latency”

What You'll Achieve.

customers get faster, cheaper inference; our margins improve; shows depth in inference or training infrastructure; instrument before you optimize; tell the difference between a real win and a benchmark artifact; the difference between a model that scales and one that doesn' t

Industry & Context.

Cloud native and AI infrastructure

Problems you'll solve

autonomous decision-making; continuous automation

Eligibility Requirements

background check may be conducted, Cast AI does not provide any form of visa sponsorship/work permit

What They're Looking For.

Must Have

5+ years building real ML systems, Python - production services, Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, working mental model of why an inference engine performs the way it does on a given GPU, Fluency with quantization tradeoffs, Comfort with distributed systems, bias toward measurement, Self-direction

What You'll Do.

Get more out of KV cache

Quantize without regressing quality

Shrink cold starts and memory footprint

Set the technical direction

How You'll Work.

Team & Collaboration

Collaborate with a global team; Bring the team along with writeups and reproducible experiments

Full Job Description

Why Cast AI? Cast AI is an automation platform that operates cloud-native and AI infrastructure at scale. By embedding autonomous decision-making directly into Kubernetes and cloud environments, Cast AI continuously optimizes performance, reliability, and efficiency in production. The old way doesn't work. As Kubernetes and AI environments grow, manual decisions don’ t. Cast AI replaces tickets, alerts, and manual tuning with continuous automation that adapts infrastructure as conditions change. Efficiency and cost savings follow naturally from that automation. Over 2,100 companies already rely on Cast AI, including Akamai, BMW, Cisco, FICO, HuggingFace, NielsenIQ, Swisscom, and TGS. Global team, diverse perspectives We're headquartered in Miami, but our impact is international. We take a global and intentional approach to diversity. Today, Cast AI operates across 34 countries spanning Europe, North America, Latin America, and APAC, bringing a wide range of perspectives into how we build and lead. Unicorn momentum In January 2026, we achieved unicorn status with a strategic investment from Pacific Alliance Ventures, the corporate venture arm of Shinsegae Group (a $50+ billion Korean conglomerate). Our valuation now exceeds $1 billion, and we're just getting started. Join us as we build the future of autonomous infrastructure. About the role Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P vLLM; SGLang; TensorRT-LLM; PyTorch; CUDA-adjacent tooling; Kubernetes; gRP; ClickHouse; PostgreSQL; GCP Pub/Sub; AWS / GCP / Azure; GitLab CI; ArgoCD; Prometheus; Grafana; Loki; Tempo. Requirements: 5+ years building real ML systems, with a portfolio that shows depth in infe

Free ATS check

Applying for this Senior ML Engineer - Kimchi (LLM Inference Optimization) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 62 detected · ranked by frequency

LLM Inference Optimization ×5

vLLM ×4

SGLang ×4

TensorRT-LLM ×4

kernel tuning ×3

quantization scheme ×3

scheduler tweak ×3

KV cache utilization ×3

continuous batching ×3

speculative decoding ×3

chunked prefill ×3

kernel-level tuning ×3

TTFT ×3

TPOT ×3

compute bottleneck ×3

memory bandwidth ×3

scheduling bottleneck ×3

networking bottleneck ×3

Paged attention ×3

prefix caching ×3

eviction policies ×3

cache reuse ×3

quantized KV ×3

INT8 ×3

INT4 ×3

FP8 ×3

weight quantization ×3

activation quantization ×3

KV quantization ×3

quality regression measurement ×3

cold start reduction ×3

memory footprint reduction ×3

Role Details

Experience 5–10 yrs

Level Senior

Work Mode remote-first

Category technology

AI-Extracted Insights

Domain Areas

cloud-native-infrastructureai-infrastructurekubernetesllm-inference

ANONYMOUS · UNFILTERED

What do employees actually say about Cast AI?

Real rants from real employees. Read before you apply.

Read Company Rants →