Lightning AI

AI

PlatformSupportEngineer(APAC)

Remote Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Platform Support Engineer (APAC) at Lightning AI. Skills: Platform Support, ML Infrastructure, Distributed Systems, Customer Support. supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments. diagnose and resolve complex distributed systems and ML infrastructure issues”

What You'll Achieve.

take ideas from research to production with less friction; Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in

Industry & Context.

AI
Problems you'll solve

diagnose failures; improve reliability; guide customers through complex distributed systems problems; complex distributed systems problems; Debug ML Infrastructure

Eligibility Requirements

Thursday–Sunday schedule, working hours from 7:00 AM to 5:00 PM local time (UTC+8)

What They're Looking For.

Must Have

supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments, technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems, diagnose and resolve complex distributed systems and ML infrastructure issues, technical advisor during high impact incidents and platform degradation events, Translate infrastructure level issues into actionable guidance for ML engineers, Build credibility with customers through technical reasoning and clear communication, Debug ML Infrastructure

Nice to Have

Kubernetes scheduling, GPU orchestration, distributed PyTorch failures, inference latency, networking bottlenecks, storage performance, platform reliability

What You'll Do.

supporting ML engineers running large-scale training and inference workloads across cloud infrastructure

and GPU platforms in production environments

diagnose and resolve complex distributed systems and ML infrastructure issues

Act as a technical advisor during high impact incidents and platform degradation events

Translate infrastructure level issues into actionable guidance for ML engineers

Build credibility with customers through technical reasoning and clear communication

Debug ML Infrastructure

How You'll Work.

Team & Collaboration

Work Directly With ML Engineers; Partner directly with customer engineering teams; collaborating as a team

Communication Scope

clear communication

Full Job Description

Who We Are Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction. Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in. We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute. Our Values Move Fast: We act with speed and precision, breaking down big challenges into achievable steps. Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision. Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best. Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft. Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters. What We’re Looking For Lightning AI is looking to hire a Platform Support Engineer to join our APAC Customer Experience team, supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments. This role is not a ticket router or traditional support engineer. You are a technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems.The problems range from Kubernetes scheduling and GPU orchestration to distributed PyTorch failures, inference latency, networking bottlenecks, storage

Free ATS check

Applying for this Platform Support Engineer (APAC) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Lightning AI?

Real rants from real employees. Read before you apply.

Read Company Rants →