Bespoke Labs

Applied AI Research

InfrastructureEngineer

Mountain View, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Infrastructure Engineer at Bespoke Labs. Skills: Environment Execution, Performance & Scale, Environment Platform. Own sandboxing and execution layer. Build systems to snapshot and restore state”

What You'll Achieve.

make agents reliable; environments have to stay coherent; systems that run reliably in production; run far more environment rollouts per dollar

Industry & Context.

Applied AI Research
Problems you'll solve

hard systems problem; systematic approach

What They're Looking For.

Must Have

track record building production systems or research infrastructure at scale, distributed systems, execution engines, container/sandboxing infrastructure, Deep comfort with the systems layer, containers and isolation, filesystems, process and state management, Experience making systems fast and cheap, profiling, scheduling, resource utilization, cost optimization at scale, cloud platforms (GCP, AWS), distributed computing, engineering fundamentals, systematic approach to testing, validation, and reliability, Comfort operating in ambiguity, Excellent communication skills, Ability to translate between research needs and infrastructure requirements, Comfortable presenting technical work

Nice to Have

Python comfort in a systems language (Rust, Go, or C++), Experience with RL training or evaluation infrastructure, experience with checkpoint/snapshot-restore systems, CRIU, distributed state management, Background in high-throughput, low-latency execution systems, Contributions to widely-used infrastructure, datasets, benchmarks, or open-source systems, Previous experience in a research engineering or infrastructure role at an AI or systems-heavy company

What You'll Do.

Own sandboxing and execution layer

Build systems to snapshot and restore state

Develop machinery to detect failure modes

Extend execution to long-horizon environments

Own platform performance characteristics

Drive utilization and scheduling

Profile and remove bottlenecks

Build and maintain framework for environments

Create tooling for debugging

Scale prototypes into production systems

Write documentation and tools

How You'll Work.

Team & Collaboration

work closely with research and data teams; directly with frontier labs and enterprise customers; working with research teams; working with enterprise customers; translate between research needs and infrastructure requirements; presenting technical work to diverse audiences

Communication Scope

Excellent communication skills; Ability to translate between research needs and infrastructure requirements; Comfortable presenting technical work to diverse audiences

Full Job Description

About Bespoke Labs Bespoke Labs is an applied AI research lab pioneering data and RL environment curation for training and evaluating agents. Recently, we curated Open Thoughts, one of the best open reasoning datasets used by multiple frontier labs, trained SOTA specialized models such as Bespoke-MiniChart-7B and Bespoke-MiniCheck, and built the environment infrastructure that frontier labs and enterprises use to make their agents reliable. Bespoke is uniquely positioned to capture a large share of data and RL environment curation. About the Role We're looking for an Infrastructure Engineer to own the execution layer beneath our RL environments: the systems that let an agent operate inside a realistic, multi-tool world coherently for hours or days. This is a hard systems problem disguised as an AI job. As the tasks agents can complete keep lengthening, the environments that train them have to stay coherent across far longer horizons than anything that exists today. That means sandboxing and isolation you can trust, execution that's fast and cheap enough to run at training scale, and the ability to snapshot, restore, inspect, and branch a running environment instead of treating every rollout as one-shot. You'll build the platform that makes all of this possible. You'll work closely with our research and data teams, and directly with frontier labs and enterprise customers, to turn environment designs into infrastructure that runs reliably in production. What You'll Do 1. Environment Execution & Sandboxing: - Design and own the sandboxing and execution layer that environments run inside. Build systems to snapshot and restore environment state (disk, process, and where relevant memory and accelerator state) so runs can be paused, resumed, inspected, and branched rather than executed once. - Develop the machinery to detect failure modes early in a rollout (reward hacks, infra faults, fairness issues) and to revert to a known-good state, patch, and continue. - Extend exec

Free ATS check

Applying for this Infrastructure Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Bespoke Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →