CuspAI
AI
DataEngineer
Neural analysis suggests this role is
optimal for Mid candidates.
“Data Engineer at CuspAI. Skills: Data Pipeline Development, Data Quality & Standardisation, Collaboration & Integration, Python, databases, large-scale data processing, workflow orchestration tools, containerisation, CI/CD practices, DevOps practices. build the pipeline infrastructure and tooling for data ingestion. moving towards self-serve setup for the scientific team members”
What You'll Achieve.
unlock trillion-dollar materials breakthroughs in months, not millennia; create high-quality training data for our ML researchers; ensure data integrity across all pipelines; drive our research and development efforts forward; enable scientists to work on world changing challenges; provide self-serve, reliable and scalable ingestion pipelines; design systems that scale with growing data volumes and user demands; advancing materials science and solving sustainability and climate-related problems; create groundbreaking solutions for a more sustainable world
Industry & Context.
solve the breakthrough materials needed to power human progress; solving sustainability and climate-related problems
What They're Looking For.
Must Have
Python, databases, large-scale data processing, workflow orchestration tools (e.g. Airflow, Prefect, Dagster, Flyte or similar), containerisation (Docker, Kubernetes), CI/CD practices, handling large/complex datasets, DevOps practices
Nice to Have
data from scientific computing (simulations or experiments), machine learning data requirements, MLOps practices, pre-processing/processing as part of model training, crystallography, materials properties, computational chemistry concepts
What You'll Do.
build the pipeline infrastructure and tooling for data ingestion
moving towards self-serve setup for the scientific team members
and tagging diverse chemical datasets
create high-quality training data for our ML researchers
Design and build robust data pipelines for materials science datasets
and computational chemistry outputs
Develop processes to integrate diverse data sources including materials databases
and laboratory instruments
Create automated workflows for processing crystallographic data
and materials properties
Build scalable systems to handle high-throughput computational chemistry calculations and experimental data
implement automated quality checks for crystal structure data
chemical compositions
and experimental measurements
Create standardisation protocols for materials nomenclature
and measurement conditions
Build monitoring systems to ensure data integrity across all pipelines
understand data requirements for model training and inference
ensure accurate representation of domain knowledge in data schemas
Integrate with laboratory automation systems and computational chemistry software
Support real-time data needs for AI-driven materials discovery experiments
How You'll Work.
Team & Collaboration
working hand in hand with ML researchers; Partner with materials scientists; True interdisciplinary teamwork; deeply collaborative environment bridging AI research, computational chemistry, and experimental science; work with world-class researchers and engineers who enjoy sharing knowledge and supporting each other
Full Job Description
ABOUT CUSPAI CuspAI is the frontier AI company on a mission to solve the breakthrough materials needed to power human progress. While nature took billions of years to perfect molecules, we are harnessing AI to unlock trillion-dollar materials breakthroughs in months, not millennia. Our founding team is the most cited in the world, comprised of world-class researchers in AI, chemistry and engineering. We are working on some of the hardest and most important challenges including energy, clean water, the future of compute, and carbon capture, and this is just the start of what our 'search engine' for next-generation materials will unlock. We invite you to be part of a diverse, innovative team at the intersection of AI and materials science, working to create impactful partnerships that drive innovation, scalability, and industry collaboration. This work matters. Your work matters. We’re on the cusp of the on-demand materials era. Join us. THE ROLE As we grow, we are seeking a Data Engineer to play a crucial part in driving our research and development efforts forward. YOUR IMPACT As a Data Engineer you will be part of the new team building the infrastructure that underpins and acts as the critical bridge between raw chemical data and our machine learning models. Your main focus will be to build the pipeline infrastructure and tooling for data ingestion, moving towards self-serve setup for the scientific team members. You'll also be responsible for securing, collecting, cleaning, standardising, and tagging diverse chemical datasets to create high-quality training data for our ML researchers while working closely with our chemistry team to ensure scientific accuracy. WHAT YOU WILL DO Data Pipeline Development - Design and build robust data pipelines for materials science datasets, experimental results, and computational chemistry outputs. - Develop processes to integrate diverse data sources including materials databases, literature, patent filings, and laboratory instru
Applying for this Data Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about CuspAI?
Real rants from real employees. Read before you apply.