Nuance Labs

Technology

MemberofTechnicalStaff—ModelOptimizationandInference(NewGrad)

$200–300k Seattle, Washington, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Entry candidates.

The Brief

“Member of Technical Staff — Model Optimization and Inference (New Grad) at Nuance Labs. Skills: Model Optimization, Inference Optimization, Machine Learning, Deep Learning. Optimize machine learning models. Optimize deep learning models”

What You'll Achieve.

Improve model performance; Reduce inference latency; Reduce model size; Deploy optimized models

Industry & Context.

Technology

Problems you'll solve

Model optimization; Inference optimization; Model debugging

What They're Looking For.

Must Have

Bachelor's degree in Computer Science, Master's degree in Computer Science, PhD in Computer Science, Experience with machine learning, Experience with deep learning, Experience with natural language processing, Experience with computer vision, Experience with large language models, Experience with model optimization, Experience with inference optimization, Experience with Python, Experience with C++, Experience with PyTorch, Experience with TensorFlow, Experience with ONNX, Experience with TensorRT, Experience with model compression, Experience with quantization, Experience with pruning, Experience with knowledge distillation

Nice to Have

Experience with distributed training, Experience with high-performance computing, Experience with GPU programming, Experience with CUDA, Experience with cloud platforms, Experience with ML frameworks, Experience with model deployment, Experience with production environments

What You'll Do.

Optimize machine learning models

Optimize deep learning models

Optimize natural language processing models

Optimize computer vision models

Optimize large language models

Optimize model inference

Develop novel optimization techniques

Implement model compression techniques

Implement quantization techniques

Implement pruning techniques

Implement knowledge distillation techniques

Collaborate with research scientists

Collaborate with engineering teams

Deploy optimized models

Evaluate model performance

Benchmark model performance

Document optimization processes

Stay current with research

How You'll Work.

Team & Collaboration

Research scientists; Engineering teams

Communication Scope

Technical documentation

Full Job Description

About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation — and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product. We’re looking for someone who’s excited about taking trained models and squeezing every last millisecond out of them. You understand — or want to deeply understand — the full stack from model weights to serving infrastructure: quantization, KV cache optimization, kernel-level acceleration, batching strategies. You’ve worked with vLLM, SGLang, or similar frameworks (through coursework, research, internships, or open-source) and have opinions about where they fall short. This posting is aimed at early-career engineers finishing or recently finished with a BS, MS, or PhD. We don’t requi

Free ATS check

Applying for this Member of Technical Staff — Model Optimization and Inference (New Grad) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 26 detected · ranked by frequency

Model Optimization ×6

Inference Optimization ×6

Model compression ×4

Quantization ×4

Pruning ×4

Knowledge distillation ×4

Machine Learning ×3

Deep Learning ×3

PyTorch ×2

TensorFlow ×2

ONNX ×2

TensorRT ×2

Natural language processing

Computer vision

Large language models

Python

Distributed training

High-performance computing

GPU programming

CUDA

Cloud platforms

ML frameworks

Model deployment

Production environments

System design

Algorithm design

Role Details

Seniority Entry

Experience 0–2 yrs

Level Entry

Work Mode Onsite

Type FULL TIME

Category software

Salary Band 200k+

AI-Extracted Insights

Domain Areas

machine-learningdeep-learningnatural-language-processingcomputer-visionlarge-language-modelsmodel-optimizationinference-optimizationmodel-compression

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Nuance Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →