Pika
Research
MultimodalLLMResearcher(MLLM)
“Multimodal LLM Researcher (MLLM) at Pika. Skills: Multimodal LLM, LLM, VLM, Audio LM, real-time generation, agentic platforms, deep learning, generative models, diffusion models. Lead and contribute to research efforts focused on real-time, multimodal generation—including text, image, video, and audio synthesis—as well as orchestration of agentic platform infrastructure. Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interactive expe”
What You'll Achieve.
drive forward our mission to make agentic real-time generative technology accessible, dynamic, and transformative for millions of creators; shaping the future of real-time creative platforms; bring research advancements into production-ready technologies
Industry & Context.
What They're Looking For.
Must Have
5+ years of relevant experience, including research during graduate studies, in large language models, vision-language models, audio language models, deep learning, or related fields, Deep expertise in at least one area: language modeling (LLM), vision-language modeling (VLM), or audio language modeling (Audio LM), experience with generative models, including autoregressive and diffusion models, and their real-time deployment, Hands-on experience curating, constructing, or augmenting large, high-quality multimodal datasets, Experience developing and deploying real-time systems and/or agentic orchestration infrastructure, programming and prototyping skills (Python, PyTorch, TensorFlow, etc.)
Nice to Have
Demonstrated impact as first author on major publications in top conferences or journals (e. g. , NeurIPS, CVPR, ICML, ICCV, SIGGRAPH, Interspeech, etc. )
What You'll Do.
Lead and contribute to research efforts focused on real-time
multimodal generation—including text
and audio synthesis—as well as orchestration of agentic platform infrastructure
Design and prototype novel algorithms and architectures for high-fidelity
real-time multimodal synthesis and interactive experiences
Focus on real-time aspects of model inference and synthesis across modalities
Work on diffusion model distillation and/or develop diffusion-based world models for multimodal applications
Train and finetune autoregressive and diffusion models in LLM
or Audio LM contexts with a focus on real-time performance
Curate specific datasets
and sensory-rich data
Collaborate with cross-functional teams to bring research advancements into production-ready technologies
Publish work in top-tier conferences and communicate research results internally and externally
Stay at the cutting edge of real-time multimodal generative AI and agentic orchestration
How You'll Work.
Team & Collaboration
Collaborate closely with engineering and product teams; Collaborate with cross-functional teams; communicate research results internally and externally
Communication Scope
Excellent communication and collaboration skills; communicate research results internally and externally
Applying for this Multimodal LLM Researcher (MLLM) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Pika?
Real rants from real employees. Read before you apply.