Protege

Technology

SeniorSoftwareEngineer,DataProcessing

₹25–45L ~AI est. Remote FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Software Engineer, Data Processing at Protege. Skills: Data processing, Data pipelines, Large-scale systems, AWS. Design ingestion systems. Build ingestion systems”

Industry & Context.

Technology

Problems you'll solve

Diagnose bottlenecks; Resolve bottlenecks; Troubleshooting

What They're Looking For.

Must Have

5+ years building production systems, 5+ years data processing at scale, Hands-on designing data pipelines, Hands-on running data pipelines, Python programming skills, Experience with distributed data processing, Proficiency with AWS

Nice to Have

Experience processing medical imaging, Experience processing text data, Experience processing audio data, Experience processing video data, Experience with sensitive data environments, Experience with regulated data environments, Experience with HIPAA, Experience with healthcare compliance, Experience with PHI handling, Experience with streaming systems, Experience with workflow orchestration, Experience with Airflow, Experience with Dagster, Experience with GCP, Experience with Azure, Prior startup experience, Familiarity with ML systems, Familiarity with NLP systems, Familiarity with LLM systems, Familiarity with embeddings, Familiarity with fine-tuning

What You'll Do.

Design ingestion systems

Build ingestion systems

Operate ingestion systems

Process multimodal data

Build modality-specific processing

Process medical imaging

Extract audio metadata

Extract video metadata

Build normalization logic

Handle non-standard formats

Handle high-variance formats

Turn work into patterns

Create reusable patterns

Build internal tooling

Build platform capabilities

Build for high volume

Build for high throughput

Process distributed workloads

Process parallel workloads

Choose execution models

Perform batch processing

Perform distributed execution

Use modern compute patterns

Process unstructured data

Perform inference-heavy processing

Prevent performance degradation

Build validation checks

Handle sensitive data

Handle regulated data

Track data provenance

Track usage constraints

Ensure downstream compliance

Improve observability

Improve debuggability

Improve operational reliability

Partner with Data Lab

Support new modalities

Support partner requirements

Translate source realities

Standardize recurring patterns

Standardize reusable transforms

Standardize validators

Standardize internal tooling

Expand into complex environments

Get productive in codebase

Ship pipeline improvements

Understand data flows

Understand modality handling

Meet engineering teams

Own processing pipeline

Deliver AI-ready output

Develop depth in data types

Raise bar on data quality

Raise bar on observability

Raise bar on processing best practices

Own ingestion and processing layer

Lead design on new modalities

Lead design on scaling challenges

Identify leverage opportunities

Drive architectural improvements

How You'll Work.

Team & Collaboration

Partner with product; Partner with Data Lab; Partner with partner engineering

Full Job Description

Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech. We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. ABOUT THE ROLE Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion — the part of the platform that takes large-scale source data and turns it into clean, structured, enriched, validated, AI-ready datasets. This is a hands-on, backend- and data-heavy role with end-to-end ownership of the pipelines that move and process data at volume. Protege connects organizations that hold high-value data with the AI builders who need it. The value of that exchange depends on what happens at ingestion: raw, varied, high-volume source data has to be processed reliably, securely, and at scale before it's useful to anyone. You'll work across imaging, audio, video, and other data modalities, crossing healthcare, media, and other disparate industries and data partners. You’ll partner closely with product, Data Lab, and partner engineering teams to build robust ingestion and processing systems for structured and unstructured data at massive scale, from millions to billions of records, files, and other source objects. This role is ideal for engineers who are energized by messy data at scale, want deep ownership of critical infrastructure, and like turning ambiguity into reliable systems

Free ATS check

Applying for this Senior Software Engineer, Data Processing role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 43 detected · ranked by frequency

AWS ×4

Data normalization ×4

Data validation ×4

Data processing ×3

Data pipeline design ×3

Data pipeline operation ×3

Large-scale data processing ×3

Distributed data processing ×3

Multimodal data processing ×3

Medical imaging processing ×3

Audio metadata extraction ×3

Video metadata extraction ×3

Quality validation ×3

Notes processing ×3

Data parsing ×3

High volume processing ×3

High throughput processing ×3

Distributed compute systems ×3

Batch processing ×3

Parallel compute systems ×3

Unstructured data compute ×3

Inference-heavy processing ×3

Performance bottleneck diagnosis ×3

Performance bottleneck resolution ×3

Data provenance tracking ×3

Metadata tracking ×3

Usage constraint tracking ×3

Security compliance ×3

Data de-identification ×3

Data pipelines ×2

Large-scale systems ×2

Python

BEHAVIOURAL

CuriousTenacious

Role Details

Experience 5–10 yrs

Level Senior

Work Mode Remote

Type FULL TIME

Category engineering

Salary Band 200k+

AI-Extracted Insights

Domain Areas

ai-training-datadata-processingmultimodal-datamedical-imagingaudio-processingvideo-processinghealthcare-datamedia-data

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Protege?

Real rants from real employees. Read before you apply.

Read Company Rants →