Source Meridian
Healthcare
DataEngineer
Neural analysis suggests this role is
optimal for Entry candidates.
“Data Engineer at Source Meridian. Skills: Data engineering, AWS data stack, Spark pipelines. Build Spark pipelines. Maintain Spark pipelines”
Industry & Context.
Solution-oriented approach
What They're Looking For.
Must Have
1-2 years professional experience, Apache Spark experience, AWS data stack experience, Amazon S3 experience, Amazon Athena experience, Airflow experience, Excellent SQL skills, Solid data modeling fundamentals, Advanced English level
Nice to Have
dbt experience, Healthcare data familiarity, Tokenization experience, Identity resolution experience, Privacy-preserving data workflows experience, AWS security concepts knowledge, Spark on AWS experience, Spark-on-containers experience
What You'll Do.
Build Spark pipelines
Maintain Spark pipelines
Process Parquet datasets
Implement tokenization workflows
Convert token to real token
Process healthcare claims datasets
Ensure identity mapping
Ensure data integrity
Orchestrate data pipelines
Develop ETL/ELT processes
Contribute to dbt models
How You'll Work.
Team & Collaboration
Cross-functional stakeholders
Communication Scope
Technical discussions; Clear documentation; Client-facing experience
Full Job Description
We’re looking for a Data Engineer to join Source Meridian. About Source Meridian Source Meridian is a development software company that works to solve the industry’s most challenging problems in healthcare practices. We are laser focused on specific technologies in the healthcare and life science industries: Healthcare technology, artificial intelligence, and healthcare interoperability. About the Role We're looking for a Data Engineer to help build and operate an AWS-native data platform processing healthcare claims data and tokenized identifiers. You'll design and implement Spark-based pipelines that transform, intersect, and enrich tokenized datasets stored primarily as Parquet on S3, queried via Athena and related AWS services. This environment intentionally avoids managed lakehouse platforms (e.g., no Databricks and no Snowflake)—you'll be doing "real" data engineering directly on AWS. What You’ll Do Build and maintain Spark pipelines to process large-scale Parquet datasets on S3. Implement tokenization workflows, including transit token → real token conversion and dataset intersection/join logic. Process and deliver healthcare claims datasets for matched individuals, ensuring accurate identity mapping and data integrity. Orchestrate data pipelines using Airflow and/or AWS-native orchestration tools when appropriate. Develop reliable, testable, and observable ETL/ELT processes (retries, idempotency, monitoring, reprocessing). Optimize performance and cost across Spark jobs, S3 partitioning/layout, and Athena query patterns. Contribute to dbt models when applicable (transformations, documentation, data quality checks). Collaborate with cross-functional stakeholders in a healthcare environment, with a strong focus on privacy and secure data handling. Required Qualifications 1 -2 years of professional experience in Data Engineering. Strong experience with Apache Spark (PySpark or Scala), including joins, intersections, partitioning, and performance tuning. Strong
Applying for this Data Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Source Meridian?
Real rants from real employees. Read before you apply.