CoreWeave

AI Cloud

HPCEngineer

£79–105k London, England, United Kingdom

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“HPC Engineer at CoreWeave. Skills: NVLink/NVSwitch platform deployment, operation, and troubleshooting, Linux system administration, Networking fundamentals, Production debugging, Hardware troubleshooting, Scripting/Automation (Python, Go, Bash), GPU interconnect technologies. Deploy, operate, and support NVLink/NVSwitch platforms across large data center environments. Troubleshoot Linux, networking, hardware, firmware, performance, and stability issues in production”

What You'll Achieve.

Improve runbooks, dashboards, alerts, and lifecycle workflows; Contribute to reliable workflows that scale across regions, platforms, and fleet growth

Industry & Context.

AI Cloud

Problems you'll solve

Production troubleshooting; Troubleshoot Linux, networking, hardware, firmware, performance, and stability issues in production; Root cause analysis

Eligibility Requirements

Participate in on-call, Basic criminal record check required for successful applicants

What They're Looking For.

Must Have

Linux system administration and troubleshooting skills, Networking fundamentals and common troubleshooting tools, Production debugging experience using logs, metrics, and command-line tools, Server, network, GPU, or data center hardware troubleshooting experience, Practical scripting or automation experience in Python, Go, Bash, or similar, Clear communication, documentation, collaboration, and on-call readiness, Curiosity to learn specialized GPU interconnect technologies such as NVLink, NVSwitch, and InfiniBand

Nice to Have

Ansible or other infrastructure automation tooling, Kubernetes application development or operations experience, Grafana, Prometheus, PromQL, or similar observability systems, Large fleet operations across Linux systems, network devices, GPUs, or infrastructure components, InfiniBand, RDMA, HPC networking, or low-latency/high-bandwidth fabrics, BMC, Redfish, IPMI, firmware lifecycle management, or hardware management APIs, NVLink, NVSwitch, NVIDIA GPU platforms, NVUE, SONiC, or network operating systems, Prior NVLink experience

What You'll Do.

and support NVLink/NVSwitch platforms across large data center environments

and stability issues in production

Build automation and improve runbooks

and lifecycle workflows

Participate in on-call

and follow-up improvements

Contribute to reliable workflows that scale across regions

How You'll Work.

Team & Collaboration

Collaborate with teams across CoreWeave, external vendors, and customer-facing stakeholders

Communication Scope

Clear communication

Process & Methodology

Drive assigned work to completion with clear communication, thoughtful prioritization, and early visibility into risks or blockers

Full Job Description

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com. We're proud to be a Living Wage accredited Employer. CoreWeave is building and operating some of the largest GPU infrastructure in the world. The Metal Net team owns the high-bandwidth GPU interconnect platforms that make large-scale AI and HPC workloads possible, including NVLink and NVSwitch-based systems. We are looking for an HPC Engineer to deploy, operate, troubleshoot, and improve these platforms across our global data center footprint. This role is a strong fit for engineers who enjoy production troubleshooting, hardware-adjacent systems work, automation, observability, and learning specialized infrastructure deeply. Prior NVLink experience is helpful, but not required. What You Will Do Deploy, operate, and support NVLink/NVSwitch platforms across large data center environments. Troubleshoot Linux, networking, hardware, firmware, performance, and stability issues in production. Build automation and improve runbooks, dashboards, alerts, and lifecycle workflows. Collaborate with teams across CoreWeave, external vendors, and customer-facing stakeholders. Drive assigned work to completion with clear communication, thoughtful prioritization, and early visibility into risks or blockers. Participate in on-call, incident response, root cause analysis, and follow-up improvements. Contribute to reliable workflows that scale across regions, platforms, and fleet growth, with ownership calibrated by level. What We Are Looking For Strong Linux system

Free ATS check

Applying for this HPC Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 57 detected · ranked by frequency

Production debugging ×3

Deploying NVLink/NVSwitch platforms ×3

Operating NVLink/NVSwitch platforms ×3

Supporting NVLink/NVSwitch platforms ×3

Troubleshooting Linux ×3

Troubleshooting networking ×3

Troubleshooting hardware ×3

Troubleshooting firmware ×3

Troubleshooting performance ×3

Troubleshooting stability ×3

Building automation ×3

Improving runbooks ×3

Improving dashboards ×3

Improving alerts ×3

Scripting ×3

Infrastructure automation ×3

Application development ×3

Operations ×3

Large fleet operations ×3

Firmware lifecycle management ×3

Hardware management APIs ×3

NVLink/NVSwitch platform deployment, operation, and troubleshooting ×2

Linux system administration ×2

Networking fundamentals ×2

Hardware troubleshooting ×2

Scripting/Automation (Python, Go, Bash) ×2

GPU interconnect technologies ×2

Grafana ×2

Prometheus ×2

PromQL ×2

BMC ×2

Redfish ×2

BEHAVIOURAL

CollaborationClear communicationThoughtful prioritizationEarly visibility into risks or blockersCuriosity

Role Details

Work Mode hybrid

Category technology

Salary Band 75k-100k

AI-Extracted Insights

Domain Areas

ai-cloud-infrastructuregpu-infrastructurehigh-bandwidth-gpu-interconnect-platformsnvlinknvswitchinfinibandrdmahpc-networking

ANONYMOUS · UNFILTERED

What do employees actually say about CoreWeave?

Real rants from real employees. Read before you apply.

Read Company Rants →