Nscale
Technology
PrincipalSystemsEngineer
Neural analysis suggests this role is
optimal for Principal candidates.
“Principal Systems Engineer at Nscale. Skills: GPU Supercluster Bringup, Network Fabric Architecture, Performance Optimization, Deployment Strategy. Define technical standards. Lead GPU cluster deployments”
What You'll Achieve.
Superclusters brought online quickly; Superclusters brought online predictably; Superclusters brought online at peak performance; Deployment processes scale; Infrastructure becomes competitive advantage; Define technical blueprint
Industry & Context.
Performance debugging; Root cause analysis
What They're Looking For.
Must Have
10+ years of experience, Large-scale infrastructure experience, HPC environments experience, Bringing up large GPU clusters, High-speed networking expertise, Server architecture understanding, Debugging performance issues, Automation experience, Systems-level thinking
Nice to Have
Scaling AI training clusters, Liquid cooling experience, Ultra-high-density deployments experience, Defining infrastructure standards
What You'll Do.
Define technical standards
Lead GPU cluster deployments
Architect network fabrics
Establish acceptance criteria
Establish validation frameworks
Tune and validate NCCL
Validate collective operations
Identify performance bottlenecks
Eliminate performance bottlenecks
Drive congestion control
Drive fabric optimization
Define performance benchmarking
Design repeatable deployment models
Build automation frameworks
Establish deployment SLAs
Establish quality gates
Establish operational readiness
Reduce time-to-capacity
Serve as escalation point
Mentor senior engineers
Shape infrastructure best practices
Influence hardware selection
Influence rack topology
Influence data center design
Partner on infrastructure strategy
How You'll Work.
Team & Collaboration
Cross-functional leadership; Partner with executive leadership
Full Job Description
Principal Systems Engineer – GPU Supercluster Bringup About Us We are building AI infrastructure for frontier-scale workloads. Our platform is designed for high-density, high-performance GPU clusters that push the limits of power, networking, and distributed compute. As a startup, we move fast, operate with ownership, and expect technical leaders to define standards—not just follow them. The Role We are hiring a Principal Deployment Engineer to architect and lead the bringup of large-scale GPU clusters (hundreds to thousands of GPUs). This is a technical leadership role responsible for defining how we deploy, validate, and scale AI superclusters across sites. You will own the full lifecycle of deployment—from rack design and fabric architecture to cluster validation frameworks and production readiness standards. You will set the bar for performance, reliability, and operational excellence. This role combines deep hands-on expertise with system-level thinking and cross-functional leadership. What You’ll Do End-to-End Supercluster Bringup Ownership Define the technical standards for node, rack, and full-cluster bringup. Lead large-scale GPU cluster deployments (multi-rack, multi-pod environments). Architect high-performance network fabrics (IB, RoCE, Ethernet) optimized for AI workloads. Establish cluster-level acceptance criteria and validation frameworks. Performance & Fabric Architecture Tune and validate NCCL, RDMA, GPUDirect, and collective operations at scale. Identify and eliminate performance bottlenecks across hardware, topology, and firmware layers. Drive congestion control and fabric optimization strategies. Define performance benchmarking methodology for AI training workloads. Deployment Strategy & Scalability Design repeatable deployment models for multi-site expansion. Build automation frameworks for provisioning and cluster validation. Establish deployment SLAs, quality gates, and operational readiness standards. Reduce time-to-capacity while increasing
Applying for this Principal Systems Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nscale?
Real rants from real employees. Read before you apply.