Nuance Labs
Technology
MemberofTechnicalStaff—RLResearch
Neural analysis suggests this role is
optimal for Entry candidates.
“Member of Technical Staff — RL Research at Nuance Labs. Skills: RL Research, Post-training methods, System design, ML/RL PhD, Omni models. Own RL and post-training for large-scale omni models, including method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. Build and scale the RL/post-training stack from 0 to 1 and 1 to 10.”
What You'll Achieve.
Improve interactive behavior, timing, interruption, emotional response, audiovisual coherence, and real-time conversational quality of AI models.
Industry & Context.
system design; distributed execution
Visa sponsorship available from day one.
What They're Looking For.
Must Have
PhD — completed, or in its final stretch — in ML, RL, or a related field, with research depth shown through publications, a lab/advisor, or substantial open-source work. Solid understanding of RL/post-training methods: policy optimization, reward modeling, preference optimization, rejection sampling, KL control, evaluation, and data feedback loops. Ability to reason about model behavior and training dynamics: reward hacking, unstable rewards, distribution shift, stale policies, mode collapse, over-optimization, noisy preferences, and evaluation mismatch. Exposure to RL/post-training pipelines through research, internships, or open-source — with frameworks such as verl, ms-swift, OpenRLHF, or equivalent, and familiarity with rollout serving systems such as vLLM. You don’t need to have run these at production scale you need to learn fast and go deep. software engineering fundamentals and the appetite to build real systems, not just prototypes. Curiosity and adaptability toward new RL algorithms, model architectures, serving systems, evaluation methods, and research ideas.
Nice to Have
Hands-on experience with omni or multimodal post-training for audio-video-language models, especially long-context or real-time interactive systems. Experience with PPO, GRPO, DPO, online RL, RLHF/RLAIF, reward modeling, preference data, synthetic data generation, or model-based data improvement. Prior 0→1 experience building post-training systems, RL pipelines, agent training systems, evaluation platforms, or model improvement loops. Experience with adjacent areas such as distributed pretraining, data infrastructure, inference serving, simulation, human/AI feedback collection, or evaluation infrastructure. Publications or substantial open-source contributions in RL, post-training, alignment, evaluation, ML systems, or model behavior.
What You'll Do.
Own RL and post-training for large-scale omni models
including method development
and distributed execution. Build and scale the RL/post-training stack from 0 to 1 and 1 to 10.
How You'll Work.
Team & Collaboration
Believes in the compounding value of working shoulder-to-shoulder.
Full Job Description
About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We’re looking for a deeply technical Member of Technical Staff to own RL and post-training for large-scale omni models. This posting is aimed at researchers who are completing — or have recently completed — a PhD and want to do their best work at a fast-moving frontier lab. This role is broader than a traditional RL algorithm role. You’ll be expected to understand modern post-training methods and help build the infrastructure needed to run them at scale. The work spans RL method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. You’ll help build Nuance’s RL/post-training stack from 0→1 and scale it from 1→10. That means turning rapidly evolving research ideas into reliable training systems: defining the abstraction
Applying for this Member of Technical Staff — RL Research role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nuance Labs?
Real rants from real employees. Read before you apply.