Mirantis
Computer Software
SeniorAIInfrastructure&PlatformOperationsEngineer
Neural analysis suggests this role is
optimal for entry candidates.
“Senior AI Infrastructure & Platform Operations Engineer at Mirantis. Skills: AI Infrastructure, Platform Operations, Kubernetes, NVIDIA GPUs. Lead incident investigation. Resolve infrastructure incidents”
Industry & Context.
Troubleshooting; Root cause analysis; Analytical skills
What They're Looking For.
Must Have
7+ years experience, Expert Linux administration, Networking expertise, Kubernetes production experience, Large-scale production infrastructure experience, Lead technical investigations, Manage complex incidents, Perform root cause analysis, Drive operational improvements, Observability understanding, Monitoring understanding, Service reliability practices understanding, Excellent troubleshooting skills, Analytical skills
Nice to Have
NVIDIA GPU infrastructure experience, Accelerated computing platforms experience, InfiniBand networking experience, NVIDIA UFM experience, AI infrastructure environments experience, HPC environments experience, Platform Engineering experience, Site Reliability Engineering experience, Large-scale Kubernetes operations experience, Infrastructure automation technologies experience, Infrastructure-as-Code practices experience, Observability platforms experience, Performance analysis experience, Distributed infrastructure platforms optimisation experience, Technical leadership experience, Mentoring experience, Team lead responsibilities
What You'll Do.
Lead incident investigation
Resolve infrastructure incidents
Resolve networking incidents
Resolve platform incidents
Act as senior escalation point
Support NVIDIA GPU infrastructure
Support high-performance networking
Troubleshoot Linux issues
Troubleshoot Kubernetes issues
Troubleshoot networking issues
Troubleshoot storage issues
Troubleshoot hardware issues
Analyze platform performance
Analyze platform capacity
Analyze platform stability
Analyze platform reliability
Identify risks proactively
Lead root cause analysis
Drive long-term corrective actions
Collaborate with engineering teams
Collaborate with hardware vendors
Collaborate with datacenter personnel
Resolve complex technical challenges
Participate in incident management
Participate in service restoration
Provide technical leadership
Drive platform reliability improvements
Drive observability improvements
Drive monitoring improvements
Drive operational process improvements
Contribute to readiness reviews
Contribute to infrastructure changes
Contribute to infrastructure upgrades
Contribute to service introductions
Support AI-powered services
Support operational capabilities
Evaluate emerging technologies
Evaluate operational practices
Improve service delivery
Improve platform resilience
Share technical knowledge
Develop operational standards
Maintain troubleshooting guides
Maintain best practices
Define operational processes
Define escalation paths
Define service reliability standards
Act as technical advisor
How You'll Work.
Team & Collaboration
Engineering teams; Hardware vendors; Datacenter personnel
Full Job Description
Mirantis helps organizations ship code faster on public and private clouds. The company provides a public cloud experience on any infrastructure from the data center to the edge. With Lens and the Mirantis Cloud Native Platform, Mirantis empowers a new breed of Kubernetes developers by removing infrastructure and operations complexity and providing one cohesive cloud experience for complete app and devops portability, a single pane of glass, and automated full-stack lifecycle management with continuous updates. Mirantis serves many of the world’s leading enterprises, including Adobe, DocuSign, Liberty Mutual, PayPal, Reliance Jio, Societe Generale, Splunk, and Volkswagen. Learn more at [www.mirantis.com](http://www.mirantis.com/). About the Role We are building a European AI Infrastructure & Platform Operations team responsible for operating large-scale AI infrastructure environments powered by NVIDIA GPUs, high-performance networking, Kubernetes, and next-generation platform technologies. As a Senior AI Infrastructure & Platform Operations Engineer, you will serve as a technical leader within the operations organization, providing deep expertise across infrastructure, networking, platform operations, and service reliability. You will be responsible for driving operational excellence across complex production environments while acting as a key escalation point for critical incidents and challenging technical issues. This role combines hands-on technical operations with technical leadership, helping shape operational standards, reliability practices, automation initiatives, and the future evolution of AI-powered operational services through platforms such as k0rdent AI. Responsibilities: Technical Operations & Service Reliability * Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents. * Act as a senior escalation point for operational teams during critical service-impacting events. * Support large-scale NVIDIA GP
Applying for this Senior AI Infrastructure & Platform Operations Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on SmartRecruiters
- SmartRecruiters often includes a video screening step — check camera and mic permissions.
- Link your GitHub or portfolio directly in the profile section for technical roles.
- Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.
ANONYMOUS · UNFILTERED
What do employees actually say about Mirantis?
Real rants from real employees. Read before you apply.