Cantina Labs

social AI

MachineLearningEngineer

Singapore, Singapore FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Machine Learning Engineer at Cantina Labs. Skills: ML Engineer, large-scale data systems and pipelines, dataset curation, filtering, quality improvement, distributed data processing frameworks, orchestration tools, containerization, container orchestration, cloud-based data storage and compute, VLM-based captioning, quality/aesthetic scoring models, CLIP-based filtering, semantic data selection, video and media processing. build and scale systems for ingesting, processing, and delivering large-s”

What You'll Achieve.

improve model outcomes; cost-efficiency; speed; reliability; reproducibility; throughput; operational efficiency; training outcomes improvement

Industry & Context.

social AI

Problems you'll solve

problem-solving

What They're Looking For.

Must Have

hands-on experience building or scaling large-scale data systems and pipelines for machine learning, including dataset curation, filtering, and quality improvement, Experience with distributed data processing frameworks such as PySpark or Ray, Experience with orchestration tools such as Airflow or equivalent, Familiarity with containerization and container orchestration, including Docker and Kubernetes, Experience working with cloud-based data storage and compute (AWS, GCS, and/or Azure), including tradeoffs around cost, throughput, storage layout, and access patterns, Experience with VLM-based captioning pipelines or quality/aesthetic scoring models for video or image data, including curation of image-text pair datasets for joint image-video training, Familiarity with CLIP-based or embedding-based filtering and semantic data selection techniques, Familiarity with video and media processing tools such as FFmpeg, PyAV, DALI, or OpenCV, and relevant libraries such as Decord, torchvision, PyTorchVideo, or torchaudio, Proficiency in Python

What You'll Do.

build and scale systems for ingesting

and delivering large-scale video and multimodal data for model training

own the full pipeline — from raw content to curated

and training-ready datasets — with a focus on speed

partner closely with curation and modeling teams to operationalize evolving dataset recipes and iterate on approaches that improve model outcomes

Design and scale distributed data pipelines for preprocessing

and repeated dataset refreshes

Own workflow orchestration

and failure recovery for large-scale data processing jobs

Implement and maintain containerized pipeline infrastructure using Kubernetes or equivalent orchestration systems

Optimize cloud-based data storage and movement across providers (AWS

and operational efficiency

Define and implement best practices for dataset storage layout

Design and implement curation pipelines that determine which video and image content is selected

and retained for model training

including image-text pair datasets used in joint training regimes

Build and improve VLM-based captioning and metadata generation workflows at scale across both video and image data

Develop and apply quality and aesthetic scoring models

CLIP-based semantic filtering

and other signal-extraction approaches for data selection

Build tooling to support deduplication workflows at scale

including near-dedup and exact deduplication pipelines over large video corpora

Analyze dataset composition

identify quality issues

and iterate on curation logic to improve training outcomes

Define and evolve standards for what constitutes high-quality

training-ready video data across different training regimes

How You'll Work.

Team & Collaboration

partner closely with curation and modeling teams to operationalize evolving dataset recipes and iterate on approaches that improve model outcomes

Communication Scope

communication

Full Job Description

About Cantina: Cantina Labs is a social AI company, developing a suite of advanced real-time models that push the boundaries of expression, personality, and realism. We bring characters to life, transforming how people tell stories, connect, and create. We build and power ecosystems. Cantina, our flagship social AI platform, is just the beginning. About the Role: Cantina is expanding, and we're looking for an ML Engineer to join our growing Singapore team! In this role, you will build and scale systems for ingesting, processing, and delivering large-scale video and multimodal data for model training. You'll own the full pipeline — from raw content to curated, filtered, and training-ready datasets — with a focus on speed, reliability, reproducibility, and cost-efficiency. You'll partner closely with curation and modeling teams to operationalize evolving dataset recipes and iterate on approaches that improve model outcomes. What You’ll Do: - Design and scale distributed data pipelines for preprocessing, dataset generation, and repeated dataset refreshes - Own workflow orchestration, job scheduling, monitoring, and failure recovery for large-scale data processing jobs - Implement and maintain containerized pipeline infrastructure using Kubernetes or equivalent orchestration systems - Optimize cloud-based data storage and movement across providers (AWS, GCS, or Azure) for cost, throughput, and operational efficiency - Define and implement best practices for dataset storage layout, versioning, caching, retention, and access patterns - Design and implement curation pipelines that determine which video and image content is selected, filtered, and retained for model training, including image-text pair datasets used in joint training regimes - Build and improve VLM-based captioning and metadata generation workflows at scale across both video and image data - Develop and apply quality and aesthetic scoring models, CLIP-based semantic filtering, and other signal-extraction app

Free ATS check

Applying for this Machine Learning Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 73 detected · ranked by frequency

VLM-based captioning ×6

distributed data processing frameworks ×5

orchestration tools ×5

containerization ×5

container orchestration ×5

cloud-based data storage and compute ×5

dataset curation ×3

quality improvement ×3

building or scaling large-scale data systems and pipelines for machine learning ×3

quality/aesthetic scoring models for video or image data ×3

curation of image-text pair datasets ×3

CLIP-based or embedding-based filtering ×3

semantic data selection techniques ×3

video and media processing tools ×3

Python programming ×3

ML Engineer ×2

large-scale data systems and pipelines ×2

filtering ×2

quality/aesthetic scoring models ×2

CLIP-based filtering ×2

semantic data selection ×2

video and media processing ×2

PySpark ×2

Ray ×2

Airflow ×2

Docker ×2

Kubernetes ×2

AWS ×2

GCS ×2

Azure ×2

FFmpeg ×2

PyAV ×2

BEHAVIOURAL

problem-solvingcommunicationdocumentation

Role Details

Type FULL TIME

Category research

AI-Extracted Insights

Domain Areas

social-aireal-time-modelsvideo-and-multimodal-datadataset-curationimage-text-pair-datasetsjoint-training-regimesvlm-based-captioningmetadata-generation

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Cantina Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →