Mirantis
Cloud Infrastructure
ProductManagerAIInference&ModelServing
“Product Manager - AI Inference & Model Serving at Mirantis. Skills: AI inference, model serving, cloud-native infrastructure, distributed systems, performance engineering. Own product strategy, roadmap, and lifecycle for inference and model serving, including serverless inference, dedicated endpoints, autoscaling, routing, KV cache management, and the related observability. Lead deep technical discovery with NeoClouds, sovereign clouds, and enterprise platform teams, and translate findings into ”
What You'll Achieve.
improve latency, throughput, utilization, reliability, cost, and operational control; Define positioning grounded in measurable outcomes: latency distributions, throughput per GPU, utilization, tail reliability, and cost per tokens
Industry & Context.
reasoning across the full stack; identifying performance bottlenecks; evaluating system design trade-offs
What They're Looking For.
Must Have
7+ years in product management, technical product management, or a senior technical role owning AI/ML and inference product(s), understanding of production AI inference, including model serving, serverless execution, dedicated endpoints, autoscaling, routing, workload placement, observability, and reliability, Proven capability to reason about performance trade-offs across GPU, network, storage, orchestration, and runtime layers, and to translate low-level technical capability into business value such as TTFT, throughput per GPU, and TCO, Working knowledge of modern inference runtimes (vLLM, SGLang, TensorRT-LLM, Dynamo, Triton) and the optimization patterns that matter in production: continuous batching, KV cache management, cold starts, prefill versus decode, disaggregated serving, and multi-model serving, Credibility with engineering leaders and infrastructure operators, including comfort in production architecture reviews and technical commercial conversations with platform engineering buyers
What You'll Do.
and lifecycle for inference and model serving
including serverless inference
and the related observability
Lead deep technical discovery with NeoClouds
and enterprise platform teams
and translate findings into prioritized requirements and architecture direction
Partner with engineering on system design trade-offs across runtime integration
including disaggregated serving and multi-model serving
Define positioning grounded in measurable outcomes: latency distributions
Drive go-to-market execution: pricing and packaging
reference architectures
and direct engagement with customers
and ecosystem partners
How You'll Work.
Team & Collaboration
Partner with engineering on system design trade-offs; Direct engagement with customers, analysts, and ecosystem partners; Collaborate with a world-class, distributed team
Communication Scope
translating technical insight into clear product requirements, architecture direction, and customer-facing solutions; technical commercial conversations with platform engineering buyers; Shape the product narrative
Process & Methodology
Own product strategy, roadmap, and lifecycle
Applying for this Product Manager - AI Inference & Model Serving role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on SmartRecruiters
- SmartRecruiters often includes a video screening step — check camera and mic permissions.
- Link your GitHub or portfolio directly in the profile section for technical roles.
- Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.
ANONYMOUS · UNFILTERED
What do employees actually say about Mirantis?
Real rants from real employees. Read before you apply.