Tavus

Engineering, Product, & Design

AIResearcher(MultimodalAudio/VideoGeneration)

San Francisco, California, United States; London, United Kingdom; United States FULL TIME Remote Friendly
The Brief

“AI Researcher (Multimodal Audio/Video Generation) at Tavus. Skills: Audio-visual generation, Diffusion models, Long-video generation, Audio-visual modeling. Lead research efforts on audio-visual generation for avatars (Neural Avatars, Talking-Heads), with a focus on conversational settings. Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync”

What You'll Achieve.

Publish impactful work

Industry & Context.

Engineering, Product, & Design

What They're Looking For.

Must Have

PhD or equivalent research experience, 2–3+ years of hands-on experience applying generative models at scale, Expertise in diffusion models, Experience in multimodal generation — spanning video, audio, and language, Proven innovation in long-video generation and/or audio generation, Excellent programming skills — fluent in PyTorch and GPU-optimized workflows, Track record of publications in top-tier venues (CVPR, NeurIPS, BMVC, ICASSP, etc.), Experience leading research activities or mentoring teams

Nice to Have

Skills in 3D graphics, Gaussian splatting, or large-scale training setups, Broad exposure to generative AI models beyond your specialty, Familiarity with software development best practices

What You'll Do.

Lead research efforts on audio-visual generation for avatars (Neural Avatars

with a focus on conversational settings

Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync

Drive innovation in diffusion models

long-video generation

and audio-visual modeling

Translate research into production by partnering with Applied ML and engineering

set research directions

and publish impactful work

How You'll Work.

Team & Collaboration

Partnering with Applied ML and engineering

Free ATS check

Applying for this AI Researcher (Multimodal Audio/Video Generation) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Tavus?

Real rants from real employees. Read before you apply.

Read Company Rants →