Company
Technology
HPC/MLInfrastructureEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“HPC/ML Infrastructure Engineer. Skills: HPC infrastructure, ML infrastructure, System administration. Lead bringup on AI training cluster. Administer AI training cluster”
What You'll Achieve.
Train best anime image model
Industry & Context.
Troubleshoot cluster issues
Work in San Francisco or Tokyo, Work on-site in Tokyo or San Francisco, Physical hardware in Bay Area
What They're Looking For.
Must Have
5+ years HPC infrastructure experience, Linux sysadmin skills, Familiar with modern HPC landscape, Experience with SLURM, Experience with parallel filesystems, Experience with networking, Experience with anime models training, Experience with ldap, Experience with dmesg, Experience with physical computers
Nice to Have
Experience with Slinky on K8s, Experience with Warewulf/MAAS/Ansible, Experience with WEKA/VAST/Ceph, Experience with Tailscale, Experience with Grafana/Prometheus stack, Experience with setting sticky bits
What You'll Do.
Lead bringup on AI training cluster
Administer AI training cluster
Operate AI training cluster
Serve as bridge between researchers and GPU machines
Ensure SLURM jobs are running
Ensure parallel filesystems are serving
Ensure network is transmitting
Ensure anime models are training
Manage cluster provisioning
Manage cluster filesystems
Manage cluster networking
Manage cluster monitoring
Set sticky bits on directories
How You'll Work.
Team & Collaboration
Small, fast-paced teams; Directly help AI researchers
Full Job Description
[https://app.ashbyhq.com/api/images/user-content/89c1442c-bd9c-46be-a910-20b49b5d9ffc/7832bc0a-28e2-4339-a413-1e13b747c3b5/hpc-admin-wide.png] We’re looking for an experienced HPC infrastructure engineer to lead bringup, administration, and operations on is probably the largest anime AI training cluster in the world. You’ll serve as the bridge between our researchers and the bare GPU machines, helping to make sure that SLURM jobs are running, parallel filesystems are serving, network is transmitting, and that the anime models are training. YOU MAY BE A GOOD FIT IF: YOU LOVE ANIME AND THE ANIME AESTHETIC. This probably one of the only jobs in the world where you will get to combine your love of anime and large-scale GPU systems. YOU’RE FAMILIAR WITH THE MODERN HPC SOFTWARE LANDSCAPE Once upon a time, our team could install SLURM on a few bare metal nodes and get away with it. Now the landscape has become unbelievable complex, with SLURM deploys through Slinky on K8s, provisioning through warewulf/MAAS/ansible, filesystems through WEKA/VAST/Ceph, VPN and access through tailscale, and monitoring via the Grafana/Prometheus stack. We’re looking for someone with relevant experience up and down the stack (and maybe a papercut or two to show for it!) AS WELL AS THE TRADITIONAL SYSADMIN LANDSCAPE Bringing up and managing cluster still requires good old linux sysadmin skills, including wrangling ldap, triaging dmesg, and setting sticky bits on directories for misbehaving users and tools. YOU'RE NOT AFRAID OF PHYSICAL COMPUTERS We’re building out edge datacenters and our CEO is still personally racking, stacking, and provisioning HGX-based nodes in our living room. Also his VLAN design sucks and he’s bad at fiber routing. Please send help. AND YOU'RE COMFORTABLE WORKING ON SMALL, FAST-PACED TEAMS. We currently have a very tiny research team, and you’ll be directly helping some of the AI researchers in the world train the best anime image model in the world. We also believe in
Applying for this HPC/ML Infrastructure Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about this company?
Real rants from real employees. Read before you apply.