Baton Corporation

crypto

ReinforcementLearningEngineer

$400–800k New York, New York, United States; San Francisco, California, United States FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Reinforcement Learning Engineer at Baton Corporation. Skills: Reinforcement Learning, production systems, risk management. Own and ship an RL-driven trading agent using real capital to increase trading volume and user participation in a memecoin ecosystem. Design reward functions and policies aligned with product goals while enforcing strict downside risk constraints”

What You'll Achieve.

increase trading volume and user participation; enforcing strict downside risk constraints; minimize reliance on live sequential testing; ship fast and see real-world impact immediately

Industry & Context.

crypto

Eligibility Requirements

Hours can be long and unconventional, The pace is intense, Expectations are high, and impact is immediate

What They're Looking For.

Must Have

previously put an autonomous learning system into production that directly controlled capital, pricing, traffic, or resources, personally designed and enforced hard risk limits (capital caps, loss bounds, circuit breakers) in a live system, built a policy evaluation loop from scratch (simulators, replay, counterfactuals, shadow deployments) before trusting live rollout, operated as the single owner of a complex ML system in a small team, with no safety net of research orgs, infra teams, or “ML platforms.”

Nice to Have

make and defend uncomfortable tradeoffs (e. g. heuristic > RL, bandit > deep RL) based on empirical results instead of ideology

What You'll Do.

Own and ship an RL-driven trading agent using real capital to increase trading volume and user participation in a memecoin ecosystem

Design reward functions and policies aligned with product goals while enforcing strict downside risk constraints

Build evaluation and validation frameworks (simulation

offline analysis) to minimize reliance on live sequential testing

Safely transition an existing heuristic-based production system toward learning-based approaches

Take end-to-end ownership and technical leadership as the sole RL expert

from data and modeling through deployment

Full Job Description

WHO WE ARE Baton Corporation is the development company that builds and operates the entire technology stack behind pump.fun http://pump.fun, the largest memecoin launchpad in production today. The systems are low latency, high throughput, live under constant load, and break if you get them wrong. WHAT YOU’LL DO As our Reinforcement Learning Engineer, you will own a production trading system that directly deploys real capital. This is not a research role - it’s about building learning systems that are robust, measurable, and safe under real-world constraints. - Own and ship an RL-driven trading agent using real capital to increase trading volume and user participation in a memecoin ecosystem - Design reward functions and policies aligned with product goals while enforcing strict downside risk constraints - Build evaluation and validation frameworks (simulation, offline analysis) to minimize reliance on live sequential testing - Safely transition an existing heuristic-based production system toward learning-based approaches - Take end-to-end ownership and technical leadership as the sole RL expert, from data and modeling through deployment, monitoring, and safeguards WHO YOU ARE: - You have previously put an autonomous learning system into production that directly controlled capital, pricing, traffic, or resources and can explain what broke and how they fixed it - Have personally designed and enforced hard risk limits (capital caps, loss bounds, circuit breakers) in a live system, not just talked about “risk-aware objectives. - Have built a policy evaluation loop from scratch (simulators, replay, counterfactuals, shadow deployments) before trusting live rollout. - Can make and defend uncomfortable tradeoffs (e.g. heuristic > RL, bandit > deep RL) based on empirical results instead of ideology - Have operated as the single owner of a complex ML system in a small team, with no safety net of research orgs, infra teams, or “ML platforms.” WHAT IT'S LIKE TO WORK HERE - We

Free ATS check

Applying for this Reinforcement Learning Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 34 detected · ranked by frequency

Design reward functions and policies ×3

Build evaluation and validation frameworks ×3

Safely transition an existing heuristic-based production system toward learning-based approaches ×3

data and modeling ×3

Reinforcement Learning ×2

production systems ×2

risk management ×2

learning systems

autonomous learning system

ML system

heuristic-based

learning-based approaches

deep RL

production trading system

real capital

robust, measurable, and safe under real-world constraints

product goals

strict downside risk constraints

offline analysis

live sequential testing

risk limits

risk-aware objectives

policy evaluation loop

live rollout

empirical results

technical leadership

deployment

monitoring

safeguards

ownership

autonomy

BEHAVIOURAL

responsibilityspeed

Role Details

Experience 5–10 yrs

Level Senior

Work Mode in person

Type FULL TIME

Category engineering

Salary Band 200k+

AI-Extracted Insights

Domain Areas

memecoin-ecosystemcrypto-scale

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Baton Corporation?

Real rants from real employees. Read before you apply.

Read Company Rants →