Nuance Labs

MemberofTechnicalStaff—RL

$300–400k Seattle, Washington, United States

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Member of Technical Staff — RL at Nuance Labs. Skills: RL, Post-training, Foundation models, Omni models. Own RL and post-training. Understand modern post-training methods”

What You'll Achieve.

Improve interactive behavior; Improve timing; Improve interruption; Improve emotional response; Improve audiovisual coherence; Improve real-time conversational quality; Improve models after pretraining; Optimize end-to-end post-training loop; Optimize rollout throughput; Optimize serving latency; Optimize GPU utilization; Optimize policy update efficiency; Optimize queueing; Optimize checkpoint overhead; Optimize research iteration speed

Industry & Context.

Problems you'll solve

Debugging failure modes; Reason about model behavior; Reason about training dynamics

What They're Looking For.

Must Have

Hands-on experience with RL, Hands-on experience with RLHF, Hands-on experience with RLAIF, Hands-on experience with post-training, Hands-on experience with alignment, Hands-on experience with large-scale fine-tuning, Understanding of RL/post-training methods, Ability to reason about model behavior, Practical experience building RL/post-training pipelines, Experience with large-scale training systems, Experience with large-scale inference systems, Understanding of omni post-training, Software engineering fundamentals, Curiosity, Adaptability

Nice to Have

Prior 0→1 experience building post-training systems, Prior 0→1 experience building RL pipelines, Prior 0→1 experience building agent training systems, Prior 0→1 experience building evaluation platforms, Prior 0→1 experience building large-scale model improvement loops, Experience with PPO, Experience with GRPO, Experience with DPO, Experience with online RL, Experience with RLHF/RLAIF, Experience with reward modeling, Experience with preference data, Experience with synthetic data generation, Experience with model-based data improvement, Experience with omni post-training, Experience with multimodal post-training, Experience with long-context systems, Experience with real-time interactive systems, Experience scaling mixed training/inference workloads, Experience with distributed pretraining, Experience with data infrastructure, Experience with inference serving, Experience with simulation, Experience with human/AI feedback collection, Experience with evaluation infrastructure, Publications in RL, Publications in post-training, Publications in alignment, Publications in evaluation, Publications in ML systems, Publications in model behavior, Substantial open-source contributions in RL, Substantial open-source contributions in post-training, Substantial open-source contributions in alignment, Substantial open-source contributions in evaluation, Substantial open-source contributions in ML systems, Substantial open-source contributions in model behavior

What You'll Do.

Own RL and post-training

Understand modern post-training methods

Build infrastructure to run methods at scale

Develop reward modeling

Perform policy optimization

Build data feedback loops

Manage distributed execution

Build RL/post-training stack

Scale RL/post-training stack

Turn research ideas into training systems

Define system abstractions

Build evaluation loops

Make system fast for researchers

Improve interactive behavior

Improve emotional response

Improve audiovisual coherence

Improve real-time conversational quality

Improve models after pretraining

Build rollout generation

Build policy optimization

Build reward/reference model serving

Build data feedback loops

Develop post-training methods

Scale post-training methods

Design system abstractions

Connect research ideas to production-scale RL runs

Build rollout workers

Build experience buffers

Build checkpoint promotion

Build evaluation loops for omni behavior

Improve emotional response

Improve audiovisual coherence

Improve instruction following

Improve real-time interaction quality

Optimize end-to-end post-training loop

Optimize rollout throughput

Optimize serving latency

Optimize GPU utilization

Optimize policy update efficiency

Optimize checkpoint overhead

Optimize research iteration speed

How You'll Work.

Team & Collaboration

Cross-functional teams; Research teams

Full Job Description

About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We’re looking for a deeply technical Member of Technical Staff to own RL and post-training for large-scale omni models. This role is broader than a traditional RL algorithm role. You will be expected to understand modern post-training methods and build the infrastructure needed to run them at scale. The work spans RL method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. You will build Nuance’s RL/post-training stack from 0→1 and scale it from 1→10. That means turning rapidly evolving research ideas into reliable training systems: defining the abstractions, choosing or modifying frameworks, wiring together rollout workers and trainers, building reward/evaluation loops, debugging failure modes, and making the syst

Free ATS check

Applying for this Member of Technical Staff — RL role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 58 detected · ranked by frequency

Foundation models ×5

Data feedback loops ×4

RL ×3

RL method development ×3

Post-training methods ×3

Full-duplex systems ×3

Reinforcement learning ×3

Preference optimization ×3

Rejection sampling ×3

KL control ×3

Model architectures ×3

Reward definitions ×3

Data sources ×3

Evaluation methods ×3

Audio processing ×3

Video processing ×3

Language processing ×3

Real-time processing ×3

Temporal alignment ×3

Post-training ×2

Omni models ×2

vLLM ×2

RLHF

RLAIF

PPO

GRPO

DPO

Online RL

Model-based data improvement

System design

Rollout generation

Reward modeling

BEHAVIOURAL

CuriosityAdaptability

Role Details

Work Mode Onsite

Category research

Salary Band 200k+

AI-Extracted Insights

Domain Areas

conversational-aifoundation-modelsfull-duplex-communicationreal-time-interactionaudiovisual-processingemotional-intelligenceuncanny-valleyspeech-to-speech-pipeline

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Nuance Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →