Mirantis

Cloud Infrastructure

ProductManagerAIInference&ModelServing

Austin, Texas, United States FULL TIME Remote Friendly

The Brief

“Product Manager - AI Inference & Model Serving at Mirantis. Skills: AI inference, model serving, cloud-native infrastructure, distributed systems, performance engineering. Own product strategy, roadmap, and lifecycle for inference and model serving, including serverless inference, dedicated endpoints, autoscaling, routing, KV cache management, and the related observability. Lead deep technical discovery with NeoClouds, sovereign clouds, and enterprise platform teams, and translate findings into ”

What You'll Achieve.

improve latency, throughput, utilization, reliability, cost, and operational control; Define positioning grounded in measurable outcomes: latency distributions, throughput per GPU, utilization, tail reliability, and cost per tokens

Industry & Context.

Cloud Infrastructure

Problems you'll solve

reasoning across the full stack; identifying performance bottlenecks; evaluating system design trade-offs

What They're Looking For.

Must Have

7+ years in product management, technical product management, or a senior technical role owning AI/ML and inference product(s), understanding of production AI inference, including model serving, serverless execution, dedicated endpoints, autoscaling, routing, workload placement, observability, and reliability, Proven capability to reason about performance trade-offs across GPU, network, storage, orchestration, and runtime layers, and to translate low-level technical capability into business value such as TTFT, throughput per GPU, and TCO, Working knowledge of modern inference runtimes (vLLM, SGLang, TensorRT-LLM, Dynamo, Triton) and the optimization patterns that matter in production: continuous batching, KV cache management, cold starts, prefill versus decode, disaggregated serving, and multi-model serving, Credibility with engineering leaders and infrastructure operators, including comfort in production architecture reviews and technical commercial conversations with platform engineering buyers

What You'll Do.

and lifecycle for inference and model serving

including serverless inference

and the related observability

Lead deep technical discovery with NeoClouds

and enterprise platform teams

and translate findings into prioritized requirements and architecture direction

Partner with engineering on system design trade-offs across runtime integration

including disaggregated serving and multi-model serving

Define positioning grounded in measurable outcomes: latency distributions

Drive go-to-market execution: pricing and packaging

reference architectures

and direct engagement with customers

and ecosystem partners

How You'll Work.

Team & Collaboration

Partner with engineering on system design trade-offs; Direct engagement with customers, analysts, and ecosystem partners; Collaborate with a world-class, distributed team

Communication Scope

translating technical insight into clear product requirements, architecture direction, and customer-facing solutions; technical commercial conversations with platform engineering buyers; Shape the product narrative

Process & Methodology

Own product strategy, roadmap, and lifecycle

Free ATS check

Applying for this Product Manager - AI Inference & Model Serving role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

Skill Signal 31 detected

Core

AI inference ×6

model serving ×6

cloud-native infrastructure ×6

distributed systems ×6

performance engineering ×6

vLLM ×5

SGLang ×5

TensorRT-LLM ×5

Dynamo ×5

Triton ×5

Required

KV cache management ×3

disaggregated serving ×3

multi-model serving ×3

Nice to have

commercially driven

product strategy

solution development

go-to-market execution

pricing and packaging

reference architectures

sizing guides

PoC playbooks

technical commercial conversations

Behavioural

deeply technical

reasoning across the full stack

identifying performance bottlenecks

evaluating system design trade-offs

translating technical insight into clear product requirements

collaboration

openness

technical excellence

Role Details

Type

FULL TIME

Experience

5–10 yrs

How to Apply on SmartRecruiters

SmartRecruiters often includes a video screening step — check camera and mic permissions.
Link your GitHub or portfolio directly in the profile section for technical roles.
Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.

ANONYMOUS · UNFILTERED

What do employees actually say about Mirantis?

Real rants from real employees. Read before you apply.

Read Company Rants →