Protege

DataLab

MachineLearningResearcher-Audio

₹35–60L ~AI est. Remote FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Machine Learning Researcher - Audio at Protege. Skills: Audio Data Quality, ML Evaluation, Speech Datasets. Research audio data quality for machine learning. Investigate audio quality effects on model training”

What You'll Achieve.

Establish trustworthy audio-quality baseline; Connect audio-quality issues to dataset improvements; Clearer prioritization over time

Industry & Context.

DataLab

Problems you'll solve

Root cause analysis

What They're Looking For.

Must Have

Master's degree + 4+ years industry experience, Experience designing and running data evaluations, Experience developing or critically evaluating metrics, Ability to connect low-level signal properties to ML behavior, Comfortable moving between research and production, Excellent written and verbal communication

Nice to Have

PhD or equivalent, Experience with ASR, TTS, speaker modeling, Experience developing evaluation frameworks for training data, Experience inventing, adapting, or validating audio quality metrics, Experience studying dataset quality and model performance, Publications or open-source contributions, Cross-functional collaboration experience, Experience collaborating with industry or academic labs

What You'll Do.

Research audio data quality for machine learning

Investigate audio quality effects on model training

Analyze and summarize Protege's audio catalog

Maintain quality scorecards and metrics

Develop methods to measure acoustic properties

Build workflows for segment-level quality evaluation

Surface localized degradation

Apply quality metrics to detect degradation

Design and run targeted evaluations

Connect audio quality issues to model behavior

Test audio quality metrics correlation

Identify failure modes of metrics

Design better metric alternatives

Translate research into filtering rules

Build scalable tools and pipelines

Track results over time

Make quality signals accessible

and communicate data asset value

How You'll Work.

Team & Collaboration

ML researchers; Data engineers; Data operations; External partners

Communication Scope

Technical docs; Empirical results

Full Job Description

Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech. We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. ROLE OVERVIEW Data is the foundation of AI performance, and we believe model quality starts with data quality. For speech and audio models in particular, the bar for signal fidelity, consistency, and quality control is exceptionally high. We’re seeking a Machine Learning Researcher focused on audio data quality, ML data evaluation, and quality control to lead the evaluation and optimization of large-scale speech datasets used to train audio, speech, and multimodal models. This role will be responsible not only for applying existing audio quality metrics, but also for researching how audio data quality should be evaluated for machine learning systems and developing new methods, benchmarks, and evaluation frameworks that better predict downstream model performance. You will help define what “high-quality audio data” means in the context of modern ML training. That includes studying how different forms of acoustic degradation, dataset inconsistency, recording conditions, speaker variation, labeling quality, segmentation quality, and signal artifacts affect model behavior across ASR, TTS, speaker modeling, representation learning, and multimodal systems. A core part of this role will be original research a

Free ATS check

Applying for this Machine Learning Researcher - Audio role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 35 detected · ranked by frequency

Audio Data Quality ×5

Speech Datasets ×5

ML Data Evaluation ×3

Quality Control ×3

Audio Quality Metrics ×3

Dataset Composition ×3

Acoustic Issues ×3

Signal Properties ×3

Dataset Characterization ×3

Speech Dataset Metrics ×3

Waveform Analysis ×3

Segmentation Evaluation ×3

Model Evaluation ×3

Filtering Rules ×3

Quality Gates ×3

Dataset Selection ×3

Scalable Tools ×3

Quality Signals ×3

ML Evaluation ×2

Python

Machine Learning

Audio Signal Processing

Speech Technology

ASR

TTS

Speaker Modeling

Representation Learning

Multimodal Systems

Data Evaluation

Audio Analysis

ML Model Evaluation

Data Quality

BEHAVIOURAL

OwnershipBias for action

Role Details

Work Mode Remote

Type FULL TIME

Category datalab

Salary Band 200k+

AI-Extracted Insights

Domain Areas

speech-technologyaudio-signal-processingmachine-learningdata-centric-aiml-evaluationacoustic-degradationdataset-inconsistencyrecording-conditions

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Protege?

Real rants from real employees. Read before you apply.

Read Company Rants →