Crusoe

Technology

SeniorStaffNetworkEngineer,Operations

$225–275k San Francisco, California, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Staff Network Engineer, Operations at Crusoe. Skills: Network Operations, Incident Response, Root Cause Analysis, Production Reliability. Own production reliability. Serve as senior technical owner”

What You'll Achieve.

Prevent recurrence of incidents; Track remediation plans to closure; Reduce toil; Accelerate mean time to resolution

Industry & Context.

Technology

Problems you'll solve

Root cause analysis; Problem-solving

Eligibility Requirements

24/7 on-call responsibility

What They're Looking For.

Must Have

12+ years production network engineering, Hands-on experience with streaming telemetry, Hands-on experience operating RDMA/RoCE, Proven track record owning production reliability, Comfort operating 10K+ device fleets, Expert hands-on knowledge of BGP, Expert knowledge of Arista (EOS), Expert knowledge of Juniper (Junos), Expert knowledge of NVIDIA/Mellanox platforms, Proficiency in Python for auto-remediation, Experience defining and owning network reliability metrics, Bachelor's degree in Computer Science, Equivalent practical experience in hyperscale environments

Nice to Have

GPU cluster interconnects, AI infrastructure, Cloud services, Data center construction, Edge network, Backbone network, Data center fabric, Hyperscale AI infrastructure, Network monitoring stack, Streaming telemetry, SNMP, NetFlow, Arbor, Python-based auto-remediation tooling, Post-incident learning, Operational excellence, Large-scale operations, Incident response, Reliability in hyperscale environments, Internet-scale environments, sFlow, Kentik, Grafana, Prometheus, ThousandEyes, RoCE v1, RoCE v2, lossless fabrics, PFC tuning, ECN tuning, DCQCN tuning, Systemic change, Operational standards, Multi-region environments, 24/7 on-call responsibility, EVPN-VXLAN, IS-IS, OSPF, MPLS, QoS, TCP/IP, CLOS architectures, Multi-vendor environments, Diagnostic tooling, Operational workflows, Service level objectives, Engineering leadership, Product leadership

What You'll Do.

Own production reliability

Serve as senior technical owner

Lead incident response

Own end-to-end response

Mitigate network events

Communicate with stakeholders

Lead RCAs for incidents

Identify systemic issues

Author remediation plans

Track remediation plans

Define network reliability metrics

Define service level objectives

Create real-time dashboards

Maintain escalation playbooks

Improve network monitoring stack

Drive continuous improvement

Write Python auto-remediation tooling

Provide technical guidance

Mentor Staff engineers

Mentor Senior engineers

Build culture of operational excellence

How You'll Work.

Team & Collaboration

Partner with Architecture; Partner with Site Reliability; Work with engineering leadership; Work with product leadership

Communication Scope

Stakeholder communication

Full Job Description

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About this Role Crusoe Cloud is seeking a Senior Staff Network Operations Engineer to own production reliability across our global network, including edge, backbone, data center fabric, and GPU cluster interconnects. You will drive incident response, root cause analysis, and the operational excellence initiatives that keep our hyperscale AI infrastructure healthy at scale. This is a senior production ownership role, not architecture, not pre-sales, not purely automation. You will set operational standards, define SLIs and SLOs, mentor Staff and Senior engineers, and serve as the senior escalation point during high-severity events. This is the role that keeps the network up. What You'll Be Working On - Own Production Reliability: Serve as the senior technical owner for uptime of Crusoe's global edge, backbone, data

Free ATS check

Applying for this Senior Staff Network Engineer, Operations role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 46 detected · ranked by frequency

Incident Response ×6

Root Cause Analysis ×6

SLI/SLO definition ×4

Network monitoring ×4

BGP ×4

EVPN-VXLAN ×4

IS-IS ×4

OSPF ×4

MPLS ×4

QoS ×4

TCP/IP ×4

PFC ×4

ECN ×4

DCQCN ×4

SNMP ×4

NetFlow ×4

sFlow ×4

Streaming telemetry ×4

Network engineering ×3

Auto-remediation ×3

Python scripting ×3

RDMA/RoCE ×3

Network Operations ×2

Production Reliability ×2

Python

RDMA

RoCE

Production ownership

Operational excellence

Escalation point

Stakeholder communication

Remediation plans

BEHAVIOURAL

Problem-solvingMentoring

Role Details

Experience 5–10 yrs

Level Senior

Type FULL TIME

Education Bachelor's

Category cloud-availability

Salary Band 200k+

AI-Extracted Insights

Domain Areas

ai-infrastructureenergy-abundanceai-workloadsai-computepower-bottleneckenergy-first-approachai-strategiesedge-network

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Crusoe?

Real rants from real employees. Read before you apply.

Read Company Rants →