itD Tech
DataEngineerIII
Neural analysis suggests this role is
optimal for Senior candidates.
“Data Engineer III at itD Tech. Skills: Data Engineering, ML Engineering, AI Data Infrastructure, Image Generation Models. Design AI-augmented data pipelines. Build AI-augmented data pipelines”
Industry & Context.
Failure handling; Troubleshooting
What They're Looking For.
Must Have
5+ years of experience in Data Engineering, ML Engineering, or hybrid role, Software engineering fundamentals, Python, Data structures, Concurrency, Asynchronous programming, Advanced SQL expertise, Complex query development, Query optimization, Large-scale data processing, Pipeline orchestration frameworks, Integrating machine learning models into production data pipelines, Inference endpoint management, Model versioning, Batching, Failure recovery, Building and operating production-scale data pipelines, Invoking machine learning models at scale, AI-assisted coding tools, Written communication skills, Verbal communication skills, Collaborate effectively across technical and business teams, Bachelor’s degree or higher in Computer Science, Data Engineering, Machine Learning, or related STEM field
Nice to Have
Experience generating, storing, indexing, and querying vector embeddings, Familiarity with content understanding models, Image classification, Object detection, OCR, NSFW detection, Aesthetic scoring systems, Leveraging LLMs for data annotation, data cleaning, evaluation, or prompt engineering workflows, Knowledge of generative AI technologies, Diffusion models, Image generation systems, Evaluation metrics, FID, CLIP Score, Previous experience leading AI-focused technology companies, Experience supporting large-scale image generation or multimodal AI initiatives
What You'll Do.
Design AI-augmented data pipelines
Build AI-augmented data pipelines
Maintain AI-augmented data pipelines
Combine data transformations with ML model inference
Develop systems for remote model inference orchestration
Optimize systems for remote model inference orchestration
Build scalable embedding generation pipelines
Build scalable embedding storage pipelines
Build scalable embedding indexing pipelines
Build scalable embedding retrieval pipelines
Curate large-scale image datasets
Manage large-scale image datasets
Design LLM-assisted annotation workflows
Operate LLM-assisted annotation workflows
Automate data labeling
Measure annotation quality
Improve annotation quality
Develop reusable frameworks
Develop pipeline components
Partner with engineers
Partner with researchers
Partner with stakeholders
Support image generation model development
Support image generation model evaluation
How You'll Work.
Team & Collaboration
Engineers; Researchers; Cross-functional stakeholders; Technical teams; Business teams
Communication Scope
Written communication; Verbal communication
Full Job Description
Data Engineer III itD is seeking a Senior AI Data Engineer III to build and scale AI-augmented data infrastructure that powers next-generation image generation models. This role sits at the intersection of Data Engineering and Machine Learning Systems, driving the development of large-scale data curation, annotation, and evaluation pipelines that improve model quality across visual quality, prompt adherence, identity preservation, naturalness, and visual text generation. The ideal candidate will bring deep expertise in AI-focused data engineering and a proven track record of building production-scale pipelines that integrate machine learning inference into data workflows. Location: Hybrid Onsite – Menlo Park, CA (required onsite collaboration with engineers and researchers) Pay Rate: $35 - $39 per hour, depending on experience. Duration: 5+ months We provide comprehensive medical benefits, a 401k plan, paid holidays, and more. Please note that we are only considering direct W2 candidates at this time, as we are unable to offer sponsorship. Responsibilities Design, build, and maintain AI-augmented data pipelines that combine traditional data transformations with machine learning model inference at billion-row scale. Develop and optimize systems for remote model inference orchestration, including batching, asynchronous execution, retry logic, throughput management, and graceful failure handling. Build and maintain scalable embedding generation, storage, indexing, and retrieval pipelines to support AI model training and evaluation. Curate and manage large-scale image datasets using SQL and model-derived signals, ensuring data quality, governance, compliance, and operational efficiency. Design and operate LLM-assisted annotation workflows that automate data labeling while measuring and improving annotation quality. Develop reusable frameworks, tooling, and pipeline components that enable broader engineering teams to efficiently build AI-powered data workflows. Partner c
Applying for this Data Engineer III role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about itD Tech?
Real rants from real employees. Read before you apply.