Nuance Labs
MemberofTechnicalStaff—RL
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Member of Technical Staff — RL at Nuance Labs. Skills: RL, Post-training, Foundation models, Omni models. Own RL and post-training. Understand modern post-training methods”
What You'll Achieve.
Improve interactive behavior; Improve timing; Improve interruption; Improve emotional response; Improve audiovisual coherence; Improve real-time conversational quality; Improve models after pretraining; Optimize end-to-end post-training loop; Optimize rollout throughput; Optimize serving latency; Optimize GPU utilization; Optimize policy update efficiency; Optimize queueing; Optimize checkpoint overhead; Optimize research iteration speed
Industry & Context.
Debugging failure modes; Reason about model behavior; Reason about training dynamics
What They're Looking For.
Must Have
Hands-on experience with RL, Hands-on experience with RLHF, Hands-on experience with RLAIF, Hands-on experience with post-training, Hands-on experience with alignment, Hands-on experience with large-scale fine-tuning, Understanding of RL/post-training methods, Ability to reason about model behavior, Practical experience building RL/post-training pipelines, Experience with large-scale training systems, Experience with large-scale inference systems, Understanding of omni post-training, Software engineering fundamentals, Curiosity, Adaptability
Nice to Have
Prior 0→1 experience building post-training systems, Prior 0→1 experience building RL pipelines, Prior 0→1 experience building agent training systems, Prior 0→1 experience building evaluation platforms, Prior 0→1 experience building large-scale model improvement loops, Experience with PPO, Experience with GRPO, Experience with DPO, Experience with online RL, Experience with RLHF/RLAIF, Experience with reward modeling, Experience with preference data, Experience with synthetic data generation, Experience with model-based data improvement, Experience with omni post-training, Experience with multimodal post-training, Experience with long-context systems, Experience with real-time interactive systems, Experience scaling mixed training/inference workloads, Experience with distributed pretraining, Experience with data infrastructure, Experience with inference serving, Experience with simulation, Experience with human/AI feedback collection, Experience with evaluation infrastructure, Publications in RL, Publications in post-training, Publications in alignment, Publications in evaluation, Publications in ML systems, Publications in model behavior, Substantial open-source contributions in RL, Substantial open-source contributions in post-training, Substantial open-source contributions in alignment, Substantial open-source contributions in evaluation, Substantial open-source contributions in ML systems, Substantial open-source contributions in model behavior
What You'll Do.
Own RL and post-training
Understand modern post-training methods
Build infrastructure to run methods at scale
Develop reward modeling
Perform policy optimization
Build data feedback loops
Manage distributed execution
Build RL/post-training stack
Scale RL/post-training stack
Turn research ideas into training systems
Define system abstractions
Build evaluation loops
Make system fast for researchers
Improve interactive behavior
Improve emotional response
Improve audiovisual coherence
Improve real-time conversational quality
Improve models after pretraining
Build rollout generation
Build policy optimization
Build reward/reference model serving
Build data feedback loops
Develop post-training methods
Scale post-training methods
Design system abstractions
Connect research ideas to production-scale RL runs
Build rollout workers
Build experience buffers
Build checkpoint promotion
Build evaluation loops for omni behavior
Improve emotional response
Improve audiovisual coherence
Improve instruction following
Improve real-time interaction quality
Optimize end-to-end post-training loop
Optimize rollout throughput
Optimize serving latency
Optimize GPU utilization
Optimize policy update efficiency
Optimize checkpoint overhead
Optimize research iteration speed
How You'll Work.
Team & Collaboration
Cross-functional teams; Research teams
Full Job Description
About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We’re looking for a deeply technical Member of Technical Staff to own RL and post-training for large-scale omni models. This role is broader than a traditional RL algorithm role. You will be expected to understand modern post-training methods and build the infrastructure needed to run them at scale. The work spans RL method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. You will build Nuance’s RL/post-training stack from 0→1 and scale it from 1→10. That means turning rapidly evolving research ideas into reliable training systems: defining the abstractions, choosing or modifying frameworks, wiring together rollout workers and trainers, building reward/evaluation loops, debugging failure modes, and making the syst
Applying for this Member of Technical Staff — RL role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nuance Labs?
Real rants from real employees. Read before you apply.