Tavus
Engineering, Product, & Design
AIResearcher(MultimodalAudio/VideoGeneration)
Neural analysis suggests this role is
optimal for Senior candidates.
“AI Researcher (Multimodal Audio/Video Generation) at Tavus. Skills: Audio-visual generation, Diffusion models, Long-video generation, Audio-visual modeling. Lead research efforts on audio-visual generation for avatars (Neural Avatars, Talking-Heads), with a focus on conversational settings. Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync”
What You'll Achieve.
Publish impactful work
Industry & Context.
What They're Looking For.
Must Have
PhD or equivalent research experience, 2–3+ years of hands-on experience applying generative models at scale, Expertise in diffusion models, Experience in multimodal generation — spanning video, audio, and language, Proven innovation in long-video generation and/or audio generation, Excellent programming skills — fluent in PyTorch and GPU-optimized workflows, Track record of publications in top-tier venues (CVPR, NeurIPS, BMVC, ICASSP, etc.), Experience leading research activities or mentoring teams
Nice to Have
Skills in 3D graphics, Gaussian splatting, or large-scale training setups, Broad exposure to generative AI models beyond your specialty, Familiarity with software development best practices
What You'll Do.
Lead research efforts on audio-visual generation for avatars (Neural Avatars
with a focus on conversational settings
Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync
Drive innovation in diffusion models
long-video generation
and audio-visual modeling
Translate research into production by partnering with Applied ML and engineering
set research directions
and publish impactful work
How You'll Work.
Team & Collaboration
Partnering with Applied ML and engineering
Full Job Description
ABOUT US Tavus https://www.tavus.io/ is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms. Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale. We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners. Be part of shaping a future where humans and machines truly understand each other. The Role We’re hiring a Senior AI Researcher to lead research in audio-visual avatar generation. This role is for someone who thrives in ambiguity, has a track record of pushing generative models to new frontiers, and wants to define what human–AI interaction looks like in practice. Your Mission 🚀 - Lead research efforts on audio-visual generation for avatars (Neural Avatars, Talking-Heads), with a focus on conversational settings. - Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync. - Drive innovation in diffusion models, long-video generation, and audio-visual modeling. - Translate research into production by partnering with Applied ML and engineering. - Mentor researchers, set research directions, and publish impactful work. You’ll Bring: - A PhD or equivalent research experience, plus 2–3+ years of hands-on experience applying generative models at scale. - Experti
Applying for this AI Researcher (Multimodal Audio/Video Generation) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Tavus?
Real rants from real employees. Read before you apply.