MDCalc
Healthcare
QAEngineer,AIProducts
Neural analysis suggests this role is
optimal for Mid+ candidates.
“QA Engineer, AI Products at MDCalc. Skills: LLM testing, AI product quality, Prompt engineering, Evaluation frameworks. Design test strategies for LLM-powered features. Execute test strategies for LLM-powered features”
Industry & Context.
Investigate failure modes; Triage failure modes; Distinguish issue types
What They're Looking For.
Must Have
5+ years software QA, 1+ year LLM/AI/ML testing, Understanding of QA principles, Experience with prompt engineering, Experience with RAG systems, Experience with LLM APIs, Experience designing automated qualitative evaluation, Proficiency with test automation tools, SQL skills, Familiarity with token usage, Familiarity with latency profiling, Familiarity with cost monitoring
Nice to Have
Playwright proficiency, Experience with Promptfoo, Experience with Braintrust, Experience with LangSmith, Experience with DeepEval, Experience with Ragas, Experience with OpenAI Evals
What You'll Do.
Design test strategies for LLM-powered features
Execute test strategies for LLM-powered features
Perform prompt regression testing
Perform output evaluation
Perform hallucination detection
Build automated evaluation pipelines
Maintain automated evaluation pipelines
Catch quality regressions in non-deterministic outputs
Perform black-box testing of AI features
Perform exploratory testing of AI features
Test clinical accuracy of AI features
Test safety of AI features
Test edge cases of AI features
Define quality metrics for AI outputs
Establish thresholds for release readiness
Investigate AI failure modes
Triage AI failure modes
Distinguish model issues
Distinguish prompt issues
Distinguish retrieval issues
Distinguish integration bugs
Participate in team discussions
Offer feedback on testability
Offer feedback on risks
Offer feedback on prompt design
Offer feedback on guardrails
Develop QA strategies for testing capacity
Develop QA strategies for automation
Develop QA strategies for evaluation coverage
How You'll Work.
Team & Collaboration
Collaborate cross-functionally; Collaborate with engineers; Collaborate with product managers; Collaborate with ML/AI engineers; Collaborate with clinical reviewers; Participate in team discussions
Communication Scope
Surface issues effectively; Surface blockers effectively; Surface risks effectively; Communicate ambiguous failures; Communicate probabilistic failures
Full Job Description
THE OPPORTUNITY Since 2005, MDCalc has been an essential part of the clinician’s workflow to help achieve better patient outcomes. Actively used by more than 65% of physicians worldwide, MDCalc is the most broadly used medical reference – at the point-of-care – for clinical decision tools and content, and one of only four references used by >50% of US HCPs. These evidence-based tools and content are used by millions of medical professionals globally and support 50+ specialties and cover 200+ patient conditions. To continue to further accelerate and steward this growth, we are expanding the AI product team with a QA Engineer. This role will be critical to MDCalc’s expanded success in continuing to support our millions of clinical users worldwide in taking care of hundreds of millions of patients. THE ROLE As a QA Engineer on the AI Products group at MDCalc, you will play a key role in ensuring the quality, reliability, and clinical trustworthiness of MDCalc's AI-powered features. You'll focus on the unique challenges of testing LLM-based systems, where outputs are non-deterministic, correctness is often a spectrum rather than a binary, and regressions can be subtle. You'll be part of a collaborative, fast-moving team that takes pride in delivering software that clinicians trust to care for millions of patients worldwide. The responsibilities of this individual include the following, but are not limited to: - Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detection - Build and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputs - Perform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge cases - Define quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost)
Applying for this QA Engineer, AI Products role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about MDCalc?
Real rants from real employees. Read before you apply.