Moonlite
AI infrastructure
SeniorSoftwareEngineer,ComputePlatform
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Software Engineer, Compute Platform at Moonlite. Skills: Compute Orchestration, Kubernetes, GPU Platform Engineering, Bare-Metal Management. Design compute orchestration platforms. Build scalable compute orchestration platforms”
What You'll Achieve.
Enable researchers and engineers to programmatically access high-performance compute resources; Maximize GPU utilization; Maintain performance guarantees; Enable automated deployment; Enable resource scheduling; Enable workload orchestration; Ensure research workloads achieve near-bare-metal efficiency; Enable safe sharing of GPU infrastructure
Industry & Context.
Solve complex performance and scalability challenges; Problem-solving
What They're Looking For.
Must Have
5+ years in software engineering, building compute platforms, container orchestration systems, distributed compute infrastructure for production environments, building compute orchestration, resource scheduling, workload management systems at scale, Kubernetes architecture, container orchestration concepts, deploying workloads in Kubernetes, Linux in production environments, systems for programming, performance optimization, low-level resource management, virtualization technologies (KVM, Xen), container runtimes, orchestration platforms, GPU architectures, CUDA programming, GPU resource management, bare-metal provisioning, out-of-band management systems, hardware abstraction layers, solve complex performance and scalability challenges, balancing pragmatic shipping with good long-term architecture, navigating ambiguity, defining requirements collaboratively, communicating technical discussions through clear documentation, Growth mindset
Nice to Have
provisioning or managing research computing environments (Kubernetes, SLURM, or HPC clusters), GPU virtualization technologies (SR-IOV, NVIDIA vGPU), multi-tenant GPU sharing, container orchestration platforms with custom scheduling or resource management, high-performance networking for GPU communication (InfiniBand, RDMA, NVLink, NVSwitch), AI/ML training frameworks (PyTorch, TensorFlow), distributed training patterns, multi-node GPU coordination, infrastructure for research institutions, labs, or technical computing environments, financial services or other regulated industry infrastructure
What You'll Do.
Design compute orchestration platforms
Build scalable compute orchestration platforms
Implement workload scheduling algorithms
Implement resource allocation algorithms
Implement optimization algorithms
Design research cluster provisioning systems
Implement research cluster provisioning systems
Develop GPU platform capabilities
Build automation for bare-metal server lifecycle
Build tooling for bare-metal server lifecycle
Optimize compute platform components
Implement monitoring systems
Implement telemetry systems
Build multi-tenant compute isolation
Build security boundaries
Build resource quotas
How You'll Work.
Team & Collaboration
Working closely with product; Working with platform team members; Working with infrastructure specialists; Collaborate with experts; Collaborate with seasoned engineers; Collaborate with industry professionals
Communication Scope
Communicating technical discussions through clear documentation
Process & Methodology
Defining requirements collaboratively, Balancing pragmatic shipping with good long-term architecture
Full Job Description
Moonlite delivers high-performance AI infrastructure for organizations running intensive computational research, large-scale model training, and demanding data processing workloads.We provide infrastructure deployed in our facilities or co-located in yours, delivering flexible on-demand or reserved compute that feels like an extension of your existing data center. Our team of AI infrastructure specialists combines bare-metal performance with cloud-native operational simplicity, enabling research teams and enterprises to deploy demanding AI workloads with enterprise-grade reliability and compliance. Your Role: You will be instrumental in building out our GPU-accelerated compute platform that powers distributed AI training and inference, large-scale simulations, and computational research workloads. Working closely with product, your platform team members, and infrastructure specialists, you’ll design and implement the compute orchestration layer that manages GPU clusters, bare-metal provisioning, and resource scheduling-enabling researchers and engineers to programmatically access high-performance compute resources with cloud-like simplicity. Job Responsibilities Compute Orchestration Systems: Design and build scalable compute orchestration platforms that manage GPU clusters, bare-metal server provisioning, and resource allocation across co-located infrastructure environments. Resource Management & Scheduling: Implement intelligent workload scheduling, resource allocation, and optimization algorithms that maximize GPU utilization while maintaining performance guarantees for research and training workloads. Research Cluster Provisioning: Design and implement systems for provisioning and managing research computing environments including Kubernetes and SLURM clusters, enabling automated deployment, resource scheduling, and workload orchestration for distributed AI training and HPC workloads. GPU Platform Engineering: Develop platform capabilities for managing latest-ge
Applying for this Senior Software Engineer, Compute Platform role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Moonlite?
Real rants from real employees. Read before you apply.