MDCalc

Healthcare

QAEngineer,AIProducts

€75–110k ~AI est. Bulgaria FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“QA Engineer, AI Products at MDCalc. Skills: LLM testing, AI product quality, Prompt engineering, Evaluation frameworks. Design test strategies for LLM-powered features. Execute test strategies for LLM-powered features”

Industry & Context.

Healthcare

Problems you'll solve

Investigate failure modes; Triage failure modes; Distinguish issue types

What They're Looking For.

Must Have

5+ years software QA, 1+ year LLM/AI/ML testing, Understanding of QA principles, Experience with prompt engineering, Experience with RAG systems, Experience with LLM APIs, Experience designing automated qualitative evaluation, Proficiency with test automation tools, SQL skills, Familiarity with token usage, Familiarity with latency profiling, Familiarity with cost monitoring

Nice to Have

Playwright proficiency, Experience with Promptfoo, Experience with Braintrust, Experience with LangSmith, Experience with DeepEval, Experience with Ragas, Experience with OpenAI Evals

What You'll Do.

Design test strategies for LLM-powered features

Execute test strategies for LLM-powered features

Perform prompt regression testing

Perform output evaluation

Perform hallucination detection

Build automated evaluation pipelines

Maintain automated evaluation pipelines

Catch quality regressions in non-deterministic outputs

Perform black-box testing of AI features

Perform exploratory testing of AI features

Test clinical accuracy of AI features

Test safety of AI features

Test edge cases of AI features

Define quality metrics for AI outputs

Establish thresholds for release readiness

Investigate AI failure modes

Triage AI failure modes

Distinguish model issues

Distinguish prompt issues

Distinguish retrieval issues

Distinguish integration bugs

Participate in team discussions

Offer feedback on testability

Offer feedback on risks

Offer feedback on prompt design

Offer feedback on guardrails

Develop QA strategies for testing capacity

Develop QA strategies for automation

Develop QA strategies for evaluation coverage

How You'll Work.

Team & Collaboration

Collaborate cross-functionally; Collaborate with engineers; Collaborate with product managers; Collaborate with ML/AI engineers; Collaborate with clinical reviewers; Participate in team discussions

Communication Scope

Surface issues effectively; Surface blockers effectively; Surface risks effectively; Communicate ambiguous failures; Communicate probabilistic failures

Full Job Description

THE OPPORTUNITY Since 2005, MDCalc has been an essential part of the clinician’s workflow to help achieve better patient outcomes. Actively used by more than 65% of physicians worldwide, MDCalc is the most broadly used medical reference – at the point-of-care – for clinical decision tools and content, and one of only four references used by >50% of US HCPs. These evidence-based tools and content are used by millions of medical professionals globally and support 50+ specialties and cover 200+ patient conditions. To continue to further accelerate and steward this growth, we are expanding the AI product team with a QA Engineer. This role will be critical to MDCalc’s expanded success in continuing to support our millions of clinical users worldwide in taking care of hundreds of millions of patients. THE ROLE As a QA Engineer on the AI Products group at MDCalc, you will play a key role in ensuring the quality, reliability, and clinical trustworthiness of MDCalc's AI-powered features. You'll focus on the unique challenges of testing LLM-based systems, where outputs are non-deterministic, correctness is often a spectrum rather than a binary, and regressions can be subtle. You'll be part of a collaborative, fast-moving team that takes pride in delivering software that clinicians trust to care for millions of patients worldwide. The responsibilities of this individual include the following, but are not limited to: - Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detection - Build and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputs - Perform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge cases - Define quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost)

Free ATS check

Applying for this QA Engineer, AI Products role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 57 detected · ranked by frequency

Evaluation frameworks ×6

LLM APIs ×4

Qualitative evaluation ×4

Rubric-based scoring ×4

Playwright ×4

Data validation ×4

Prompt engineering ×3

Prompt regression testing ×3

Output evaluation ×3

Hallucination detection ×3

Black-box testing ×3

Exploratory testing ×3

Clinical accuracy testing ×3

Safety testing ×3

Edge case testing ×3

Quality metrics definition ×3

Failure mode investigation ×3

Model issue triage ×3

Prompt issue triage ×3

Retrieval issue triage ×3

Integration bug triage ×3

Test case creation ×3

Test case documentation ×3

Deterministic systems testing ×3

Non-deterministic systems testing ×3

LLM tooling ×3

LLM concepts ×3

LLM evaluation ×3

LLM-as-judge ×3

Semantic similarity ×3

Golden dataset regression ×3

Test automation ×3

BEHAVIOURAL

Solutions-oriented attitudeClear communicatorConcise communicator

Role Details

Seniority Senior

Work Mode Remote

Type FULL TIME

Category software

Salary Band 75k-100k

AI-Extracted Insights

Domain Areas

llm-based-systemsnon-deterministic-outputsclinical-trustworthinessclinical-accuracyclinical-safetymedical-reference-toolspoint-of-care-tools

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about MDCalc?

Real rants from real employees. Read before you apply.

Read Company Rants →