Nuance Labs
Technology
MemberofTechnicalStaff—ModelOptimizationandInference(NewGrad)
Neural analysis suggests this role is
optimal for Entry candidates.
“Member of Technical Staff — Model Optimization and Inference (New Grad) at Nuance Labs. Skills: Model Optimization, Inference Optimization, Machine Learning, Deep Learning. Optimize machine learning models. Optimize deep learning models”
What You'll Achieve.
Improve model performance; Reduce inference latency; Reduce model size; Deploy optimized models
Industry & Context.
Model optimization; Inference optimization; Model debugging
What They're Looking For.
Must Have
Bachelor's degree in Computer Science, Master's degree in Computer Science, PhD in Computer Science, Experience with machine learning, Experience with deep learning, Experience with natural language processing, Experience with computer vision, Experience with large language models, Experience with model optimization, Experience with inference optimization, Experience with Python, Experience with C++, Experience with PyTorch, Experience with TensorFlow, Experience with ONNX, Experience with TensorRT, Experience with model compression, Experience with quantization, Experience with pruning, Experience with knowledge distillation
Nice to Have
Experience with distributed training, Experience with high-performance computing, Experience with GPU programming, Experience with CUDA, Experience with cloud platforms, Experience with ML frameworks, Experience with model deployment, Experience with production environments
What You'll Do.
Optimize machine learning models
Optimize deep learning models
Optimize natural language processing models
Optimize computer vision models
Optimize large language models
Optimize model inference
Develop novel optimization techniques
Implement model compression techniques
Implement quantization techniques
Implement pruning techniques
Implement knowledge distillation techniques
Collaborate with research scientists
Collaborate with engineering teams
Deploy optimized models
Evaluate model performance
Benchmark model performance
Document optimization processes
Stay current with research
How You'll Work.
Team & Collaboration
Research scientists; Engineering teams
Communication Scope
Technical documentation
Full Job Description
About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation — and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product. We’re looking for someone who’s excited about taking trained models and squeezing every last millisecond out of them. You understand — or want to deeply understand — the full stack from model weights to serving infrastructure: quantization, KV cache optimization, kernel-level acceleration, batching strategies. You’ve worked with vLLM, SGLang, or similar frameworks (through coursework, research, internships, or open-source) and have opinions about where they fall short. This posting is aimed at early-career engineers finishing or recently finished with a BS, MS, or PhD. We don’t requi
Applying for this Member of Technical Staff — Model Optimization and Inference (New Grad) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Nuance Labs?
Real rants from real employees. Read before you apply.