Preference Model

AI

ReinforcementLearningEnvironmentsEngineer-Cybersecurity

$180–300k San Francisco, California, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Reinforcement Learning Environments Engineer - Cybersecurity at Preference Model. Skills: Reinforcement Learning Environments Engineering, Cybersecurity, Python, systems programming, low-level language (C, C++, Rust), web/application stack, security tooling. Design and build RL environments and reward functions that produce clean, learnable signals for frontier models on offensive and defensive security tasks across diverse programming languages. Build environments covering the full vulnerabilit”

What You'll Achieve.

automate every role at a hypothetical AI research lab; models' understanding of cybersecurity; teach LLMs to reason about and solve real-world cybersecurity problems; finding vulnerabilities in production codebases; generating working exploits and patching them safely; contribute directly to the data layer that powers frontier LLM capability in security; produce clean, learnable signals for frontier models on offensive and defensive security tasks

Industry & Context.

AI
Problems you'll solve

Problem solvers who take ownership and drive solutions end-to-end

Eligibility Requirements

Visa sponsorship & relocation support available

What They're Looking For.

Must Have

security fundamentals and broad interests across both offensive and defensive work, Hands-on experience finding, exploiting, or patching real vulnerabilities through CTFs, bug bounty work, security research, redlue team engagements, or shipped security work in industry, Proficiency in Python and systems programming, working comfort in at least one low-level language (C, C++, Rust), working comfort in at least one web/application stack, Ability to meet throughput expectations, respond quickly to feedback

Nice to Have

Published security research, CVEs, or notable bug bounty findings, CTF background or competitive results at events like DEF CON CTF, or similar, Deep expertise in a specific area: binary exploitation, kernel security, browser/V8 internals, hypervisor security, cryptographic implementation, web application security, or cloud/container security, Experience building or contributing to fuzzing infrastructure, vulnerability scanners, or automated program analysis tools, Experience with ML for code or security, built complex interactive RL environments, agent harnesses, or sandboxed evaluation infrastructure

What You'll Do.

Design and build RL environments and reward functions that produce clean

learnable signals for frontier models on offensive and defensive security tasks across diverse programming languages

Build environments covering the full vulnerability lifecycle: discovery in source code

Build environments for reverse engineering tasks across binaries

Construct verifiable reward signals using fuzzers

exploit-success checks

and patch-correctness validation

How You'll Work.

Team & Collaboration

Collaborate with others to brainstorm and create new ideas and tools to improve the environment building process

Full Job Description

ABOUT US Preference Model is building automated ML research engineering. Existing frontier models are brittle when applied to real-world ML tasks. The present bottleneck is the lack of high-quality RL training environments. Our first step is to build RL environments that reflect real-world complexity, with diverse tasks and robust reward functions. Our founding team has previous experience on Anthropic’s data team building data infrastructure, and datasets behind Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. ABOUT THE ROLE As part of our goal to automate every role at a hypothetical AI research lab. One important capability we care about is models' understanding of cybersecurity. We're hiring experienced Security Engineers to design and build reinforcement learning environments that teach LLMs to reason about and solve real-world cybersecurity problems, such as finding vulnerabilities in production codebases to generating working exploits and patching them safely. You'll join a small, high-ownership team and contribute directly to the data layer that powers frontier LLM capability in security. WHAT YOU WILL DO - Design and build RL environments and reward functions that produce clean, learnable signals for frontier models on offensive and defensive security tasks across diverse programming languages. - Build environments covering the full vulnerability lifecycle: discovery in source code, exploiting, patching. - Build environments for reverse engineering tasks across binaries, bytecode, and obfuscated code. - Construct verifiable reward signals using fuzzers, sanitizers, symbolic execution, static analyzers, exploit-success checks, and patch-correctness validation. - Collaborate with others to brainstorm and create new ideas and tools to improve the environment building process. WHAT WE ARE LOOKING FOR - Strong security fundamentals and broad interests across both offensive and defensive work. You read ad

Free ATS check

Applying for this Reinforcement Learning Environments Engineer - Cybersecurity role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Preference Model?

Real rants from real employees. Read before you apply.

Read Company Rants →