CuspAI

AI

DataEngineer

London, United Kingdom FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Data Engineer at CuspAI. Skills: Data Pipeline Development, Data Quality & Standardisation, Collaboration & Integration, Python, databases, large-scale data processing, workflow orchestration tools, containerisation, CI/CD practices, DevOps practices. build the pipeline infrastructure and tooling for data ingestion. moving towards self-serve setup for the scientific team members”

What You'll Achieve.

unlock trillion-dollar materials breakthroughs in months, not millennia; create high-quality training data for our ML researchers; ensure data integrity across all pipelines; drive our research and development efforts forward; enable scientists to work on world changing challenges; provide self-serve, reliable and scalable ingestion pipelines; design systems that scale with growing data volumes and user demands; advancing materials science and solving sustainability and climate-related problems; create groundbreaking solutions for a more sustainable world

Industry & Context.

AI
Problems you'll solve

solve the breakthrough materials needed to power human progress; solving sustainability and climate-related problems

What They're Looking For.

Must Have

Python, databases, large-scale data processing, workflow orchestration tools (e.g. Airflow, Prefect, Dagster, Flyte or similar), containerisation (Docker, Kubernetes), CI/CD practices, handling large/complex datasets, DevOps practices

Nice to Have

data from scientific computing (simulations or experiments), machine learning data requirements, MLOps practices, pre-processing/processing as part of model training, crystallography, materials properties, computational chemistry concepts

What You'll Do.

build the pipeline infrastructure and tooling for data ingestion

moving towards self-serve setup for the scientific team members

and tagging diverse chemical datasets

create high-quality training data for our ML researchers

Design and build robust data pipelines for materials science datasets

and computational chemistry outputs

Develop processes to integrate diverse data sources including materials databases

and laboratory instruments

Create automated workflows for processing crystallographic data

and materials properties

Build scalable systems to handle high-throughput computational chemistry calculations and experimental data

implement automated quality checks for crystal structure data

chemical compositions

and experimental measurements

Create standardisation protocols for materials nomenclature

and measurement conditions

Build monitoring systems to ensure data integrity across all pipelines

understand data requirements for model training and inference

ensure accurate representation of domain knowledge in data schemas

Integrate with laboratory automation systems and computational chemistry software

Support real-time data needs for AI-driven materials discovery experiments

How You'll Work.

Team & Collaboration

working hand in hand with ML researchers; Partner with materials scientists; True interdisciplinary teamwork; deeply collaborative environment bridging AI research, computational chemistry, and experimental science; work with world-class researchers and engineers who enjoy sharing knowledge and supporting each other

Full Job Description

ABOUT CUSPAI CuspAI is the frontier AI company on a mission to solve the breakthrough materials needed to power human progress. While nature took billions of years to perfect molecules, we are harnessing AI to unlock trillion-dollar materials breakthroughs in months, not millennia. Our founding team is the most cited in the world, comprised of world-class researchers in AI, chemistry and engineering. We are working on some of the hardest and most important challenges including energy, clean water, the future of compute, and carbon capture, and this is just the start of what our 'search engine' for next-generation materials will unlock. We invite you to be part of a diverse, innovative team at the intersection of AI and materials science, working to create impactful partnerships that drive innovation, scalability, and industry collaboration. This work matters. Your work matters. We’re on the cusp of the on-demand materials era. Join us. THE ROLE As we grow, we are seeking a Data Engineer to play a crucial part in driving our research and development efforts forward. YOUR IMPACT As a Data Engineer you will be part of the new team building the infrastructure that underpins and acts as the critical bridge between raw chemical data and our machine learning models. Your main focus will be to build the pipeline infrastructure and tooling for data ingestion, moving towards self-serve setup for the scientific team members. You'll also be responsible for securing, collecting, cleaning, standardising, and tagging diverse chemical datasets to create high-quality training data for our ML researchers while working closely with our chemistry team to ensure scientific accuracy. WHAT YOU WILL DO Data Pipeline Development - Design and build robust data pipelines for materials science datasets, experimental results, and computational chemistry outputs. - Develop processes to integrate diverse data sources including materials databases, literature, patent filings, and laboratory instru

Free ATS check

Applying for this Data Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about CuspAI?

Real rants from real employees. Read before you apply.

Read Company Rants →