Torc Robotics
Autonomous Vehicle Technology
Senior,MLEngineerAutoTagger
“Senior, ML Engineer - Auto Tagger at Torc Robotics. Skills: ML Engineering, Data Engineering, Autonomous Data Curation, Distributed Systems, Cloud Platforms, Machine Learning, Dataset Curation, Scenario Mining, Event Tagging. Architect and optimize distributed data pipelines to process massive multi-sensor logs (camera, LiDAR, radar, kinematics), automatically extracting and cataloging safety-critical and long-tail driving events. Develop and tune both heuristic-based and ML-assisted algorithms ”
What You'll Achieve.
Accelerate development across autonomous perception, sensor fusion, and generative simulation testing; Enable high-speed querying and retrieval for ML training, regression testing, and system validation; Operationalize a continuous data loop
Industry & Context.
Translate complex data engineering challenges into clear strategies
What They're Looking For.
Must Have
6+ years in data engineering, ML systems, or autonomous data curation, Python and SQL skills, Heavy experience processing massive time-series or unstructured datasets, Hands-on machine learning and dataset curation experience, Demonstrated history of implementing targeted datasets that measurably improve downstream model performance, Hands-on experience using Databricks (or similar platforms) for large-scale analytics, interactive querying, and making massive vehicle datasets searchable, Expertise in distributed compute frameworks (Ray, Spark, Beam), Expertise in cloud platforms (AWS, GCP, or Azure) for executing heavy data workloads, Experience parsing complex data formats, Applying scenario-description standards like Pegasus layers, Exceptional ability to translate complex data engineering challenges into clear strategies for cross-functional stakeholders, Proven track record of mentoring teams, driving system architecture, and defining engineering roadmaps
Nice to Have
Familiarity with foundational models, Familiarity with auto-labeling pipelines, Familiarity with zero-shot classification for scenario extraction, Experience with vLLM, SGLang, or similar frameworks for highly optimized, high-throughput model serving and inference, Experience with semantic extraction and attribute mapping to help build out a robust semantic inference engine, moving beyond standard bounding-box object detection, Familiarity with parsing robotics formats (ROS bags, MCAP), Familiarity with optimizing high-performance columnar storage formats (Parquet, Arrow), Knowledge of how scenario data feeds into generative simulation workflows, neural rendering, or sensor fusion validation, Experience building semantic retrieval systems or vector databases for automotive data
What You'll Do.
Architect and optimize distributed data pipelines to process massive multi-sensor logs (camera
automatically extracting and cataloging safety-critical and long-tail driving events
Develop and tune both heuristic-based and ML-assisted algorithms (including exploring Vision-Language Models or semantic vector search) to automatically classify and describe complex environmental and behavioral scenarios
Extract and format scenario data utilizing the Pegasus layer standard (alongside opensource frameworks) to ensure semantic consistency and rigorous metadata integrity
Manage the ingestion of tagged events into the observations database
enabling high-speed querying and retrieval for ML training
and system validation
Operate with broad autonomy to drive consensus across organizational boundaries
Collaborate closely with downstream consumers in perception
and systems engineering to define what constitutes an 'interesting scenario' and operationalize a continuous data loop
and elevate less-experienced engineers
establish coding standards
and foster a culture of technical excellence and collaborative problem-solving
How You'll Work.
Team & Collaboration
Operate with broad autonomy to drive consensus across organizational boundaries; Collaborate closely with downstream consumers in perception, simulation, and systems engineering to define what constitutes an 'interesting scenario' and operationalize a continuous data loop; Guide, mentor, and elevate less-experienced engineers; Lead design reviews, establish coding standards, and foster a culture of technical excellence and collaborative problem-solving
Communication Scope
Exceptional ability to translate complex data engineering challenges into clear strategies for cross-functional stakeholders
Process & Methodology
Defining engineering roadmaps
Applying for this Senior, ML Engineer - Auto Tagger role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Torc Robotics?
Real rants from real employees. Read before you apply.