Cast AI
Cloud-native and AI infrastructure
SeniorMLEngineer-Kimchi(LLMInferenceOptimization)
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior ML Engineer - Kimchi (LLM Inference Optimization) at Cast AI. Skills: LLM Inference Optimization, vLLM, SGLang, TensorRT-LLM. Push throughput. Cut latency”
What You'll Achieve.
customers get faster, cheaper inference; our margins improve; shows depth in inference or training infrastructure; instrument before you optimize; tell the difference between a real win and a benchmark artifact; the difference between a model that scales and one that doesn' t
Industry & Context.
autonomous decision-making; continuous automation
background check may be conducted, Cast AI does not provide any form of visa sponsorship/work permit
What They're Looking For.
Must Have
5+ years building real ML systems, Python - production services, Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, working mental model of why an inference engine performs the way it does on a given GPU, Fluency with quantization tradeoffs, Comfort with distributed systems, bias toward measurement, Self-direction
What You'll Do.
Get more out of KV cache
Quantize without regressing quality
Shrink cold starts and memory footprint
Set the technical direction
How You'll Work.
Team & Collaboration
Collaborate with a global team; Bring the team along with writeups and reproducible experiments
Full Job Description
Why Cast AI? Cast AI is an automation platform that operates cloud-native and AI infrastructure at scale. By embedding autonomous decision-making directly into Kubernetes and cloud environments, Cast AI continuously optimizes performance, reliability, and efficiency in production. The old way doesn't work. As Kubernetes and AI environments grow, manual decisions don’ t. Cast AI replaces tickets, alerts, and manual tuning with continuous automation that adapts infrastructure as conditions change. Efficiency and cost savings follow naturally from that automation. Over 2,100 companies already rely on Cast AI, including Akamai, BMW, Cisco, FICO, HuggingFace, NielsenIQ, Swisscom, and TGS. Global team, diverse perspectives We're headquartered in Miami, but our impact is international. We take a global and intentional approach to diversity. Today, Cast AI operates across 34 countries spanning Europe, North America, Latin America, and APAC, bringing a wide range of perspectives into how we build and lead. Unicorn momentum In January 2026, we achieved unicorn status with a strategic investment from Pacific Alliance Ventures, the corporate venture arm of Shinsegae Group (a $50+ billion Korean conglomerate). Our valuation now exceeds $1 billion, and we're just getting started. Join us as we build the future of autonomous infrastructure. About the role Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P vLLM; SGLang; TensorRT-LLM; PyTorch; CUDA-adjacent tooling; Kubernetes; gRP; ClickHouse; PostgreSQL; GCP Pub/Sub; AWS / GCP / Azure; GitLab CI; ArgoCD; Prometheus; Grafana; Loki; Tempo. Requirements: 5+ years building real ML systems, with a portfolio that shows depth in infe
Applying for this Senior ML Engineer - Kimchi (LLM Inference Optimization) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Cast AI?
Real rants from real employees. Read before you apply.