Nscale

AI Infrastructure

PrincipalBack-EndNetworkEngineer-AIInfrastructureOperations

$150–215k AMER Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Principal candidates.

The Brief

“Principal Back-End Network Engineer - AI Infrastructure Operations at Nscale. Skills: Infiniband, RoCE fabrics, Network operations, AI infrastructure. Own technical direction for AI interconnect networks. Own operational strategy for AI interconnect networks”

What You'll Achieve.

Improve fabric reliability; Improve performance predictability; Improve operational maturity; Reduce incident count

Industry & Context.

AI Infrastructure
Problems you'll solve

Troubleshoot network incidents; Resolve cross-layer issues

What They're Looking For.

Must Have

10+ years network engineering experience, Deep focus on HPC, AI, or hyperscale data centre networking, Expert-level operational and architectural experience with Infiniband and/or large-scale RoCE fabrics, Deep understanding of RDMA internals, Expertise in modern data centre routing and control planes, Proven ability to lead complex technical initiatives across teams without direct authority, Systems-level mindset

Nice to Have

Extensive experience with NVIDIA/Mellanox networking platforms, Deep familiarity with distributed training frameworks, Deep familiarity with GPU communication patterns, Experience designing network observability systems, Prior experience influencing platform or infrastructure strategy at scale

What You'll Do.

Own technical direction for AI interconnect networks

Own operational strategy for AI interconnect networks

Design Infiniband fabric architectures

Review Infiniband fabric architectures

Evolve Infiniband fabric architectures

Act as senior escalation point for network incidents

Guide deep technical investigations

Drive initiatives to improve fabric reliability

Drive initiatives to improve fabric performance predictability

Drive initiatives to improve operational maturity

Define standards for hardware configuration

Define standards for congestion control

Define standards for routing

Define standards for firmware lifecycle management

Define standards for change safety

Partner with SRE teams

Partner with Compute Platform teams

Partner with Network Architecture teams

Influence end-to-end system design

Mentor senior network engineers

Mentor mid-level network engineers

Raise bar for operational rigor

Raise bar for technical excellence

How You'll Work.

Team & Collaboration

Cross-team initiatives; Partner with SRE; Partner with Compute Platform; Partner with Network Architecture

Full Job Description

About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future. About The Role The Network Operations and Engineering teams at Nscale operate some of the most demanding networking environments in the industry, supporting tightly coupled GPU clusters where network performance directly impacts customer outcomes. We’re looking for a Principal Network Engineer – AI Infrastructure to provide technical leadership across Nscale’s high-speed networking domain. This role is focused on owning the reliability, scalability, and long-term evolution of our Infiniband and RDMA-based network fabrics. You will operate as a technical authority, influencing architecture, standards, and operational practices across teams while tackling the most complex network challenges in the platform. What You'll Be Doing Owning the technical direction and operational strategy for Nscale’s AI interconnect networks Designing, reviewing, and evolving large-scale Infiniband and RoCE fabric architectures to support future growth and workload demands Acting as the senior escalation point for the most complex network incidents, guiding deep technical investigations and systemic fixes Driving cross-team initiatives to improve fabric reliability, p

Free ATS check

Applying for this Principal Back-End Network Engineer - AI Infrastructure Operations role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Nscale?

Real rants from real employees. Read before you apply.

Read Company Rants →