Alembic
Technology
SeniorNetwork&SiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Network & Site Reliability Engineer at Alembic. Skills: Network architecture, Reliability engineering, Automation, Observability. Architect scalable network architecture. Operate scalable network architecture”
Industry & Context.
Solve deep infrastructure problems; Debug complex network issues; Debug complex system issues
On-call rotations
What They're Looking For.
Must Have
8+ years network engineering, 5+ years datacenter operations, 5+ years systems administration, 5+ years network administration, Network security background, Network architecture background, Network design background, Network operations background, Extensive hands-on experience network devices, Extensive hands-on experience large-scale architectures, Extensive hands-on experience network protocols, Experience designing datacenter network fabrics, Experience operating datacenter network fabrics, Experience with network automation tooling, Experience with IaC tooling, Experience with IPAM platforms, Experience with DCIM platforms, WAN engineering experience, Carrier circuit provisioning experience, External network peering experience, Kubernetes networking experience, Linux production infrastructure experience, Experience with monitoring stacks, Experience with observability stacks, Solid Python scripting, Solid Bash scripting, Excellent cross-functional communication
Nice to Have
NVIDIA networking technologies, Cumulus Linux experience, InfiniBand experience, Spectrum-X experience, BlueField DPUs experience, Data-intensive platforms familiarity, Storage network protocols familiarity, Security practices experience, High-compliance environments experience, SOC 2 environments experience
What You'll Do.
Architect scalable network architecture
Operate scalable network architecture
Architect secure network architecture
Operate secure network architecture
Architect network architecture for ML workloads
Operate network architecture for ML workloads
Own network device configuration management
Ensure configuration consistency
Ensure configuration reliability
Improve system reliability
Improve network reliability
Improve performance through automation
Improve performance through observability
Improve performance through capacity planning
Implement complex network protocols
Manage complex network protocols
Implement complex network connectivity
Manage complex network connectivity
Implement WAN circuits
Implement external peering
Manage external peering
Build incident response
Manage on-call rotations
Drive post-incident analysis
Drive continuous improvement
Ensure operational readiness
Partner across engineering
Partner across data science
Drive culture of performance
Drive culture of reliability
How You'll Work.
Team & Collaboration
Partner across engineering; Partner across data science
Communication Scope
Cross-functional communication
Full Job Description
ABOUT US Alembic is the pioneering Causal AI platform. We help the world's largest enterprises move past correlation to prove what actually drives business outcomes — the question marketing and growth teams have never been able to answer with confidence. Fortune 100 companies including Nvidia, Delta Air Lines, and Mars use Alembic to make multimillion-dollar decisions on trusted, causal evidence. We're backed by a $145M Series B from WndrCo (founded by Jeffrey Katzenberg), Jensen Huang, Joe Montana, Prysm Capital, and Accenture. Our models run on our own NVIDIA DGX SuperPOD built on Grace Blackwell infrastructure — one of the fastest private supercomputers in the world. (We've melted GPUs getting here.) ABOUT THE ROLE We're building infrastructure that has to perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the foundation it runs on. This isn't a traditional "keep the lights on" role. You'll design and operate the global network and reliability layer behind one of the world's fastest private supercomputers — the fabric powering distributed compute, ML workloads, real-time analytics, and mission-critical enterprise systems. You'll work across networking, systems, automation, observability, and reliability engineering to scale a platform where performance genuinely matters, with real influence over architecture decisions. It's a strong fit if you like solving deep infrastructure problems, building resilient systems, automating everything repetitive, and owning architecture rather than just maintaining it. WHAT YOU'LL DO - Architect and operate scalable, secure network architecture for high-security requirements and large-scale machine learning workloads. - Own network device configuration management end to end, ensuring consistency and reliability across the fleet. - Improve system and network reliability and performance through automation, observability, and proactive capacity planning. - Implement
Applying for this Senior Network & Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Alembic?
Real rants from real employees. Read before you apply.