Alembic

Technology

SeniorNetwork&SiteReliabilityEngineer

$210–240k San Francisco, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Network & Site Reliability Engineer at Alembic. Skills: Network architecture, Reliability engineering, Automation, Observability. Architect scalable network architecture. Operate scalable network architecture”

Industry & Context.

Technology
Problems you'll solve

Solve deep infrastructure problems; Debug complex network issues; Debug complex system issues

Eligibility Requirements

On-call rotations

What They're Looking For.

Must Have

8+ years network engineering, 5+ years datacenter operations, 5+ years systems administration, 5+ years network administration, Network security background, Network architecture background, Network design background, Network operations background, Extensive hands-on experience network devices, Extensive hands-on experience large-scale architectures, Extensive hands-on experience network protocols, Experience designing datacenter network fabrics, Experience operating datacenter network fabrics, Experience with network automation tooling, Experience with IaC tooling, Experience with IPAM platforms, Experience with DCIM platforms, WAN engineering experience, Carrier circuit provisioning experience, External network peering experience, Kubernetes networking experience, Linux production infrastructure experience, Experience with monitoring stacks, Experience with observability stacks, Solid Python scripting, Solid Bash scripting, Excellent cross-functional communication

Nice to Have

NVIDIA networking technologies, Cumulus Linux experience, InfiniBand experience, Spectrum-X experience, BlueField DPUs experience, Data-intensive platforms familiarity, Storage network protocols familiarity, Security practices experience, High-compliance environments experience, SOC 2 environments experience

What You'll Do.

Architect scalable network architecture

Operate scalable network architecture

Architect secure network architecture

Operate secure network architecture

Architect network architecture for ML workloads

Operate network architecture for ML workloads

Own network device configuration management

Ensure configuration consistency

Ensure configuration reliability

Improve system reliability

Improve network reliability

Improve performance through automation

Improve performance through observability

Improve performance through capacity planning

Implement complex network protocols

Manage complex network protocols

Implement complex network connectivity

Manage complex network connectivity

Implement WAN circuits

Implement external peering

Manage external peering

Build incident response

Manage on-call rotations

Drive post-incident analysis

Drive continuous improvement

Ensure operational readiness

Partner across engineering

Partner across data science

Drive culture of performance

Drive culture of reliability

How You'll Work.

Team & Collaboration

Partner across engineering; Partner across data science

Communication Scope

Cross-functional communication

Full Job Description

ABOUT US Alembic is the pioneering Causal AI platform. We help the world's largest enterprises move past correlation to prove what actually drives business outcomes — the question marketing and growth teams have never been able to answer with confidence. Fortune 100 companies including Nvidia, Delta Air Lines, and Mars use Alembic to make multimillion-dollar decisions on trusted, causal evidence. We're backed by a $145M Series B from WndrCo (founded by Jeffrey Katzenberg), Jensen Huang, Joe Montana, Prysm Capital, and Accenture. Our models run on our own NVIDIA DGX SuperPOD built on Grace Blackwell infrastructure — one of the fastest private supercomputers in the world. (We've melted GPUs getting here.) ABOUT THE ROLE We're building infrastructure that has to perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the foundation it runs on. This isn't a traditional "keep the lights on" role. You'll design and operate the global network and reliability layer behind one of the world's fastest private supercomputers — the fabric powering distributed compute, ML workloads, real-time analytics, and mission-critical enterprise systems. You'll work across networking, systems, automation, observability, and reliability engineering to scale a platform where performance genuinely matters, with real influence over architecture decisions. It's a strong fit if you like solving deep infrastructure problems, building resilient systems, automating everything repetitive, and owning architecture rather than just maintaining it. WHAT YOU'LL DO - Architect and operate scalable, secure network architecture for high-security requirements and large-scale machine learning workloads. - Own network device configuration management end to end, ensuring consistency and reliability across the fleet. - Improve system and network reliability and performance through automation, observability, and proactive capacity planning. - Implement

Free ATS check

Applying for this Senior Network & Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Alembic?

Real rants from real employees. Read before you apply.

Read Company Rants →