Preference Model
AI
ReinforcementLearningEnvironmentsEngineer-Cybersecurity
Neural analysis suggests this role is
optimal for Mid candidates.
“Reinforcement Learning Environments Engineer - Cybersecurity at Preference Model. Skills: Reinforcement Learning Environments Engineering, Cybersecurity, Python, systems programming, low-level language (C, C++, Rust), web/application stack, security tooling. Design and build RL environments and reward functions that produce clean, learnable signals for frontier models on offensive and defensive security tasks across diverse programming languages. Build environments covering the full vulnerabilit”
What You'll Achieve.
automate every role at a hypothetical AI research lab; models' understanding of cybersecurity; teach LLMs to reason about and solve real-world cybersecurity problems; finding vulnerabilities in production codebases; generating working exploits and patching them safely; contribute directly to the data layer that powers frontier LLM capability in security; produce clean, learnable signals for frontier models on offensive and defensive security tasks
Industry & Context.
Problem solvers who take ownership and drive solutions end-to-end
Visa sponsorship & relocation support available
What They're Looking For.
Must Have
security fundamentals and broad interests across both offensive and defensive work, Hands-on experience finding, exploiting, or patching real vulnerabilities through CTFs, bug bounty work, security research, redlue team engagements, or shipped security work in industry, Proficiency in Python and systems programming, working comfort in at least one low-level language (C, C++, Rust), working comfort in at least one web/application stack, Ability to meet throughput expectations, respond quickly to feedback
Nice to Have
Published security research, CVEs, or notable bug bounty findings, CTF background or competitive results at events like DEF CON CTF, or similar, Deep expertise in a specific area: binary exploitation, kernel security, browser/V8 internals, hypervisor security, cryptographic implementation, web application security, or cloud/container security, Experience building or contributing to fuzzing infrastructure, vulnerability scanners, or automated program analysis tools, Experience with ML for code or security, built complex interactive RL environments, agent harnesses, or sandboxed evaluation infrastructure
What You'll Do.
Design and build RL environments and reward functions that produce clean
learnable signals for frontier models on offensive and defensive security tasks across diverse programming languages
Build environments covering the full vulnerability lifecycle: discovery in source code
Build environments for reverse engineering tasks across binaries
Construct verifiable reward signals using fuzzers
exploit-success checks
and patch-correctness validation
How You'll Work.
Team & Collaboration
Collaborate with others to brainstorm and create new ideas and tools to improve the environment building process
Full Job Description
ABOUT US Preference Model is building automated ML research engineering. Existing frontier models are brittle when applied to real-world ML tasks. The present bottleneck is the lack of high-quality RL training environments. Our first step is to build RL environments that reflect real-world complexity, with diverse tasks and robust reward functions. Our founding team has previous experience on Anthropic’s data team building data infrastructure, and datasets behind Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. ABOUT THE ROLE As part of our goal to automate every role at a hypothetical AI research lab. One important capability we care about is models' understanding of cybersecurity. We're hiring experienced Security Engineers to design and build reinforcement learning environments that teach LLMs to reason about and solve real-world cybersecurity problems, such as finding vulnerabilities in production codebases to generating working exploits and patching them safely. You'll join a small, high-ownership team and contribute directly to the data layer that powers frontier LLM capability in security. WHAT YOU WILL DO - Design and build RL environments and reward functions that produce clean, learnable signals for frontier models on offensive and defensive security tasks across diverse programming languages. - Build environments covering the full vulnerability lifecycle: discovery in source code, exploiting, patching. - Build environments for reverse engineering tasks across binaries, bytecode, and obfuscated code. - Construct verifiable reward signals using fuzzers, sanitizers, symbolic execution, static analyzers, exploit-success checks, and patch-correctness validation. - Collaborate with others to brainstorm and create new ideas and tools to improve the environment building process. WHAT WE ARE LOOKING FOR - Strong security fundamentals and broad interests across both offensive and defensive work. You read ad
Applying for this Reinforcement Learning Environments Engineer - Cybersecurity role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Preference Model?
Real rants from real employees. Read before you apply.