Tavus
Engineering, Product, & Design
AIResearcher(MultimodalAudio/VideoGeneration)
“AI Researcher (Multimodal Audio/Video Generation) at Tavus. Skills: Audio-visual generation, Diffusion models, Long-video generation, Audio-visual modeling. Lead research efforts on audio-visual generation for avatars (Neural Avatars, Talking-Heads), with a focus on conversational settings. Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync”
What You'll Achieve.
Publish impactful work
Industry & Context.
What They're Looking For.
Must Have
PhD or equivalent research experience, 2–3+ years of hands-on experience applying generative models at scale, Expertise in diffusion models, Experience in multimodal generation — spanning video, audio, and language, Proven innovation in long-video generation and/or audio generation, Excellent programming skills — fluent in PyTorch and GPU-optimized workflows, Track record of publications in top-tier venues (CVPR, NeurIPS, BMVC, ICASSP, etc.), Experience leading research activities or mentoring teams
Nice to Have
Skills in 3D graphics, Gaussian splatting, or large-scale training setups, Broad exposure to generative AI models beyond your specialty, Familiarity with software development best practices
What You'll Do.
Lead research efforts on audio-visual generation for avatars (Neural Avatars
with a focus on conversational settings
Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync
Drive innovation in diffusion models
long-video generation
and audio-visual modeling
Translate research into production by partnering with Applied ML and engineering
set research directions
and publish impactful work
How You'll Work.
Team & Collaboration
Partnering with Applied ML and engineering
Applying for this AI Researcher (Multimodal Audio/Video Generation) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Tavus?
Real rants from real employees. Read before you apply.