Protege
Technology
SeniorSoftwareEngineer,DataProcessing
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Software Engineer, Data Processing at Protege. Skills: Data processing, Data pipelines, Large-scale systems, AWS. Design ingestion systems. Build ingestion systems”
Industry & Context.
Diagnose bottlenecks; Resolve bottlenecks; Troubleshooting
What They're Looking For.
Must Have
5+ years building production systems, 5+ years data processing at scale, Hands-on designing data pipelines, Hands-on running data pipelines, Python programming skills, Experience with distributed data processing, Proficiency with AWS
Nice to Have
Experience processing medical imaging, Experience processing text data, Experience processing audio data, Experience processing video data, Experience with sensitive data environments, Experience with regulated data environments, Experience with HIPAA, Experience with healthcare compliance, Experience with PHI handling, Experience with streaming systems, Experience with workflow orchestration, Experience with Airflow, Experience with Dagster, Experience with GCP, Experience with Azure, Prior startup experience, Familiarity with ML systems, Familiarity with NLP systems, Familiarity with LLM systems, Familiarity with embeddings, Familiarity with fine-tuning
What You'll Do.
Design ingestion systems
Build ingestion systems
Operate ingestion systems
Process multimodal data
Build modality-specific processing
Process medical imaging
Extract audio metadata
Extract video metadata
Build normalization logic
Handle non-standard formats
Handle high-variance formats
Turn work into patterns
Create reusable patterns
Build internal tooling
Build platform capabilities
Build for high volume
Build for high throughput
Process distributed workloads
Process parallel workloads
Choose execution models
Perform batch processing
Perform distributed execution
Use modern compute patterns
Process unstructured data
Perform inference-heavy processing
Prevent performance degradation
Build validation checks
Handle sensitive data
Handle regulated data
Track data provenance
Track usage constraints
Ensure downstream compliance
Improve observability
Improve debuggability
Improve operational reliability
Partner with Data Lab
Support new modalities
Support partner requirements
Translate source realities
Standardize recurring patterns
Standardize reusable transforms
Standardize validators
Standardize internal tooling
Expand into complex environments
Get productive in codebase
Ship pipeline improvements
Understand data flows
Understand modality handling
Meet engineering teams
Own processing pipeline
Deliver AI-ready output
Develop depth in data types
Raise bar on data quality
Raise bar on observability
Raise bar on processing best practices
Own ingestion and processing layer
Lead design on new modalities
Lead design on scaling challenges
Identify leverage opportunities
Drive architectural improvements
How You'll Work.
Team & Collaboration
Partner with product; Partner with Data Lab; Partner with partner engineering
Full Job Description
Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech. We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. ABOUT THE ROLE Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion — the part of the platform that takes large-scale source data and turns it into clean, structured, enriched, validated, AI-ready datasets. This is a hands-on, backend- and data-heavy role with end-to-end ownership of the pipelines that move and process data at volume. Protege connects organizations that hold high-value data with the AI builders who need it. The value of that exchange depends on what happens at ingestion: raw, varied, high-volume source data has to be processed reliably, securely, and at scale before it's useful to anyone. You'll work across imaging, audio, video, and other data modalities, crossing healthcare, media, and other disparate industries and data partners. You’ll partner closely with product, Data Lab, and partner engineering teams to build robust ingestion and processing systems for structured and unstructured data at massive scale, from millions to billions of records, files, and other source objects. This role is ideal for engineers who are energized by messy data at scale, want deep ownership of critical infrastructure, and like turning ambiguity into reliable systems
Applying for this Senior Software Engineer, Data Processing role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Protege?
Real rants from real employees. Read before you apply.