Nscale
Technology
SeniorBack-EndNetworkEngineer-AIInfrastructureOperations
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Back-End Network Engineer - AI Infrastructure Operations at Nscale. Skills: Infiniband, RoCE, RDMA, Network operations. Own operational health. Own configuration consistency”
Industry & Context.
Troubleshooting
On-call rotation
What They're Looking For.
Must Have
5+ years network engineering, 3+ years HPC/AI interconnect networks, Hands-on Infiniband/RoCE operational experience, Expert understanding RDMA concepts, Fundamentals data centre networking, Troubleshoot complex network issues, Linux-based tooling, Fabric diagnostics, Python, Go, or shell scripting, 24/7 operational environment experience
Nice to Have
Experience NVIDIA/Mellanox Spectrum switches, Experience NVIDIA ConnectX NICs, Familiarity AI/ML training workflows, Experience network observability systems, Experience telemetry systems, Knowledge GPU communication libraries
What You'll Do.
Own operational health
Own configuration consistency
Own performance tuning
Lead incident diagnosis
Lead incident resolution
Drive blameless postmortems
Implement preventative fixes
Define automation requirements
Define tooling requirements
Contribute network provisioning
Contribute network validation
Contribute monitoring systems
Collaborate Network Architecture
Collaborate Network Engineering
Validate fabric designs
Enforce routing standards
Enforce congestion control standards
Enforce firmware baselines
Monitor fabric utilisation
Monitor fabric performance
Tune for predictable latency
Act as subject matter expert
Support mission-critical infrastructure
How You'll Work.
Team & Collaboration
Cross-functional teams; Network Architecture teams; Network Engineering teams; SREs
Full Job Description
About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future. About The Role Within Nscale, the Network Operations team is responsible for the performance and reliability of the high-speed interconnect fabrics that underpin our AI and HPC platforms. These networks are critical to distributed training and inference workloads and demand a deep operational focus. We’re looking for a Senior Network Engineer – AI Infrastructure to join our Network Operations team. In this role, you will be responsible for the day-to-day health, stability, and performance of Nscale’s large-scale Infiniband and RDMA over Converged Ethernet (RoCE) fabrics. You’ll bring deep operational expertise from high-performance or hyperscale environments and play a key role in incident response, performance tuning, and continuous improvement of latency-sensitive AI networking systems. What You'll be Doing Owning the operational health, configuration consistency, and performance tuning of large-scale Infiniband and RoCE fabrics supporting AI and HPC workloads Leading the diagnosis and resolution of complex network incidents (P0/P1), spanning firmware, kernel drivers, switch hardware, and application or middleware layers Driving blameless pos
Applying for this Senior Back-End Network Engineer - AI Infrastructure Operations role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nscale?
Real rants from real employees. Read before you apply.