Lightning AI
AI
PlatformSupportEngineer(APAC)
Neural analysis suggests this role is
optimal for Mid candidates.
“Platform Support Engineer (APAC) at Lightning AI. Skills: Platform Support, ML Infrastructure, Distributed Systems, Customer Support. supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments. diagnose and resolve complex distributed systems and ML infrastructure issues”
What You'll Achieve.
take ideas from research to production with less friction; Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in
Industry & Context.
diagnose failures; improve reliability; guide customers through complex distributed systems problems; complex distributed systems problems; Debug ML Infrastructure
Thursday–Sunday schedule, working hours from 7:00 AM to 5:00 PM local time (UTC+8)
What They're Looking For.
Must Have
supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments, technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems, diagnose and resolve complex distributed systems and ML infrastructure issues, technical advisor during high impact incidents and platform degradation events, Translate infrastructure level issues into actionable guidance for ML engineers, Build credibility with customers through technical reasoning and clear communication, Debug ML Infrastructure
Nice to Have
Kubernetes scheduling, GPU orchestration, distributed PyTorch failures, inference latency, networking bottlenecks, storage performance, platform reliability
What You'll Do.
supporting ML engineers running large-scale training and inference workloads across cloud infrastructure
and GPU platforms in production environments
diagnose and resolve complex distributed systems and ML infrastructure issues
Act as a technical advisor during high impact incidents and platform degradation events
Translate infrastructure level issues into actionable guidance for ML engineers
Build credibility with customers through technical reasoning and clear communication
Debug ML Infrastructure
How You'll Work.
Team & Collaboration
Work Directly With ML Engineers; Partner directly with customer engineering teams; collaborating as a team
Communication Scope
clear communication
Full Job Description
Who We Are Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction. Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in. We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute. Our Values Move Fast: We act with speed and precision, breaking down big challenges into achievable steps. Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision. Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best. Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft. Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters. What We’re Looking For Lightning AI is looking to hire a Platform Support Engineer to join our APAC Customer Experience team, supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments. This role is not a ticket router or traditional support engineer. You are a technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems.The problems range from Kubernetes scheduling and GPU orchestration to distributed PyTorch failures, inference latency, networking bottlenecks, storage
Applying for this Platform Support Engineer (APAC) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Lightning AI?
Real rants from real employees. Read before you apply.