Nuance Labs

Technology

MemberofTechnicalStaff—RLResearch

Seattle, Washington, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Entry candidates.

The Brief

“Member of Technical Staff — RL Research at Nuance Labs. Skills: RL Research, Post-training methods, System design, ML/RL PhD, Omni models. Own RL and post-training for large-scale omni models, including method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. Build and scale the RL/post-training stack from 0 to 1 and 1 to 10.”

What You'll Achieve.

Improve interactive behavior, timing, interruption, emotional response, audiovisual coherence, and real-time conversational quality of AI models.

Industry & Context.

Technology

Problems you'll solve

system design; distributed execution

Eligibility Requirements

Visa sponsorship available from day one.

What They're Looking For.

Must Have

PhD — completed, or in its final stretch — in ML, RL, or a related field, with research depth shown through publications, a lab/advisor, or substantial open-source work. Solid understanding of RL/post-training methods: policy optimization, reward modeling, preference optimization, rejection sampling, KL control, evaluation, and data feedback loops. Ability to reason about model behavior and training dynamics: reward hacking, unstable rewards, distribution shift, stale policies, mode collapse, over-optimization, noisy preferences, and evaluation mismatch. Exposure to RL/post-training pipelines through research, internships, or open-source — with frameworks such as verl, ms-swift, OpenRLHF, or equivalent, and familiarity with rollout serving systems such as vLLM. You don’t need to have run these at production scale you need to learn fast and go deep. software engineering fundamentals and the appetite to build real systems, not just prototypes. Curiosity and adaptability toward new RL algorithms, model architectures, serving systems, evaluation methods, and research ideas.

Nice to Have

Hands-on experience with omni or multimodal post-training for audio-video-language models, especially long-context or real-time interactive systems. Experience with PPO, GRPO, DPO, online RL, RLHF/RLAIF, reward modeling, preference data, synthetic data generation, or model-based data improvement. Prior 0→1 experience building post-training systems, RL pipelines, agent training systems, evaluation platforms, or model improvement loops. Experience with adjacent areas such as distributed pretraining, data infrastructure, inference serving, simulation, human/AI feedback collection, or evaluation infrastructure. Publications or substantial open-source contributions in RL, post-training, alignment, evaluation, ML systems, or model behavior.

What You'll Do.

Own RL and post-training for large-scale omni models

including method development

and distributed execution. Build and scale the RL/post-training stack from 0 to 1 and 1 to 10.

How You'll Work.

Team & Collaboration

Believes in the compounding value of working shoulder-to-shoulder.

Full Job Description

About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We’re looking for a deeply technical Member of Technical Staff to own RL and post-training for large-scale omni models. This posting is aimed at researchers who are completing — or have recently completed — a PhD and want to do their best work at a fast-moving frontier lab. This role is broader than a traditional RL algorithm role. You’ll be expected to understand modern post-training methods and help build the infrastructure needed to run them at scale. The work spans RL method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. You’ll help build Nuance’s RL/post-training stack from 0→1 and scale it from 1→10. That means turning rapidly evolving research ideas into reliable training systems: defining the abstraction

Free ATS check

Applying for this Member of Technical Staff — RL Research role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 13 detected · ranked by frequency

PhD in ML, RL, or a related field ×3

Solid understanding of RL/post-training methods ×3

Ability to reason about model behavior and training dynamics ×3

Exposure to RL/post-training pipelines ×3

software engineering fundamentals ×3

RL Research, Post-training methods, System design, ML/RL PhD, Omni models ×2

audio

video

language

Conversational AI, photorealistic AI avatars, emotional intelligence, full-duplex audiovisual systems.

verl, ms-swift, OpenRLHF, vLLM

BEHAVIOURAL

Curiosity, adaptability

Role Details

Seniority Entry

Experience 0–3 yrs

Level Entry

Work Mode Onsite

Type FULL TIME

Education PhD

Category data

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Nuance Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →