Moonlite

AI infrastructure

SeniorSoftwareEngineer,ComputePlatform

$165–225k United States Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Software Engineer, Compute Platform at Moonlite. Skills: Compute Orchestration, Kubernetes, GPU Platform Engineering, Bare-Metal Management. Design compute orchestration platforms. Build scalable compute orchestration platforms”

What You'll Achieve.

Enable researchers and engineers to programmatically access high-performance compute resources; Maximize GPU utilization; Maintain performance guarantees; Enable automated deployment; Enable resource scheduling; Enable workload orchestration; Ensure research workloads achieve near-bare-metal efficiency; Enable safe sharing of GPU infrastructure

Industry & Context.

AI infrastructure

Problems you'll solve

Solve complex performance and scalability challenges; Problem-solving

What They're Looking For.

Must Have

5+ years in software engineering, building compute platforms, container orchestration systems, distributed compute infrastructure for production environments, building compute orchestration, resource scheduling, workload management systems at scale, Kubernetes architecture, container orchestration concepts, deploying workloads in Kubernetes, Linux in production environments, systems for programming, performance optimization, low-level resource management, virtualization technologies (KVM, Xen), container runtimes, orchestration platforms, GPU architectures, CUDA programming, GPU resource management, bare-metal provisioning, out-of-band management systems, hardware abstraction layers, solve complex performance and scalability challenges, balancing pragmatic shipping with good long-term architecture, navigating ambiguity, defining requirements collaboratively, communicating technical discussions through clear documentation, Growth mindset

Nice to Have

provisioning or managing research computing environments (Kubernetes, SLURM, or HPC clusters), GPU virtualization technologies (SR-IOV, NVIDIA vGPU), multi-tenant GPU sharing, container orchestration platforms with custom scheduling or resource management, high-performance networking for GPU communication (InfiniBand, RDMA, NVLink, NVSwitch), AI/ML training frameworks (PyTorch, TensorFlow), distributed training patterns, multi-node GPU coordination, infrastructure for research institutions, labs, or technical computing environments, financial services or other regulated industry infrastructure

What You'll Do.

Design compute orchestration platforms

Build scalable compute orchestration platforms

Implement workload scheduling algorithms

Implement resource allocation algorithms

Implement optimization algorithms

Design research cluster provisioning systems

Implement research cluster provisioning systems

Develop GPU platform capabilities

Build automation for bare-metal server lifecycle

Build tooling for bare-metal server lifecycle

Optimize compute platform components

Implement monitoring systems

Implement telemetry systems

Build multi-tenant compute isolation

Build security boundaries

Build resource quotas

How You'll Work.

Team & Collaboration

Working closely with product; Working with platform team members; Working with infrastructure specialists; Collaborate with experts; Collaborate with seasoned engineers; Collaborate with industry professionals

Communication Scope

Communicating technical discussions through clear documentation

Process & Methodology

Defining requirements collaboratively, Balancing pragmatic shipping with good long-term architecture

Full Job Description

Moonlite delivers high-performance AI infrastructure for organizations running intensive computational research, large-scale model training, and demanding data processing workloads.We provide infrastructure deployed in our facilities or co-located in yours, delivering flexible on-demand or reserved compute that feels like an extension of your existing data center. Our team of AI infrastructure specialists combines bare-metal performance with cloud-native operational simplicity, enabling research teams and enterprises to deploy demanding AI workloads with enterprise-grade reliability and compliance. Your Role: You will be instrumental in building out our GPU-accelerated compute platform that powers distributed AI training and inference, large-scale simulations, and computational research workloads. Working closely with product, your platform team members, and infrastructure specialists, you’ll design and implement the compute orchestration layer that manages GPU clusters, bare-metal provisioning, and resource scheduling-enabling researchers and engineers to programmatically access high-performance compute resources with cloud-like simplicity. Job Responsibilities Compute Orchestration Systems: Design and build scalable compute orchestration platforms that manage GPU clusters, bare-metal server provisioning, and resource allocation across co-located infrastructure environments. Resource Management & Scheduling: Implement intelligent workload scheduling, resource allocation, and optimization algorithms that maximize GPU utilization while maintaining performance guarantees for research and training workloads. Research Cluster Provisioning: Design and implement systems for provisioning and managing research computing environments including Kubernetes and SLURM clusters, enabling automated deployment, resource scheduling, and workload orchestration for distributed AI training and HPC workloads. GPU Platform Engineering: Develop platform capabilities for managing latest-ge

Free ATS check

Applying for this Senior Software Engineer, Compute Platform role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 51 detected · ranked by frequency

Kubernetes ×7

Go ×4

C/C++ ×4

Python ×4

Rust ×4

GPU Platform Engineering ×3

Linux systems programming ×3

Container orchestration ×3

Virtualization ×3

GPU computing ×3

Bare-metal provisioning ×3

API development ×3

SDK development ×3

Monitoring systems ×3

Telemetry systems ×3

Compute Orchestration ×2

Bare-Metal Management ×2

KVM ×2

Xen ×2

Docker ×2

NVIDIA GPUDirect ×2

SR-IOV ×2

NVIDIA vGPU ×2

CUDA ×2

InfiniBand ×2

RDMA ×2

NVLink ×2

NVSwitch ×2

Terraform ×2

FastAPI ×2

gRPC ×2

SLURM ×2

BEHAVIOURAL

AutonomyCommunication

Role Details

Experience 5–10 yrs

Level Senior

Work Mode Remote

Category engineering

Salary Band 150k-200k

AI-Extracted Insights

Domain Areas

ai-infrastructurehigh-performance-computinglarge-scale-model-trainingdata-processing-workloadsgpu-accelerated-computedistributed-ai-traininginferencelarge-scale-simulations

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Moonlite?

Real rants from real employees. Read before you apply.

Read Company Rants →