Harper

Insurance

SeniorMemberofTechnicalStaff,AIQuality

$176–253k San Francisco, California, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Member of Technical Staff, AI Quality at Harper. Skills: AI Quality, LLM Evaluation, Regression Testing. Build capability eval suites. Build regression eval suites”

What You'll Achieve.

Turn agent quality from vibe into number; Know when agent improves; Know when agent regresses before customer

Industry & Context.

Insurance

Problems you'll solve

Debug hallucination

Eligibility Requirements

Long days

What They're Looking For.

Must Have

3-6 years building software, Hands-on production LLM/agent eval experience, Capability + regression suite design, LLM-as-judge graders, Golden datasets, Designed an LLM-as-judge rubric, Debug hallucination by reading transcripts, Familiar with at least one major eval written communication, Write code with AI daily

Nice to Have

Open-source eval-framework red-team/adversarial voice eval, ML eval/observability background

What You'll Do.

Build capability eval suites

Build regression eval suites

Curate golden datasets

Design deterministic graders

Design LLM-as-judge graders

Ship pre-merge eval gates

Wire production trajectory monitoring

Turn ops findings into permanent tests

How You'll Work.

Team & Collaboration

Work alongside engineer

Communication Scope

Written communication

Full Job Description

SENIOR MEMBER OF TECHNICAL STAFF, AI QUALITY Harper is an AI-native commercial insurance company in San Francisco. We're not bolting AI onto insurance — we're rebuilding the entire business as software, on a simple bet: turning expert human judgment into compute is one of the largest transitions left to make, and a trillion-dollar industry still run 90% by hand is the place to prove it. We've grown ~100x in the last year and we move at that speed — on-site, in person, long days, very high standards. Almost no one joins Harper for insurance; they join to build the company that replaces how it works. THE ROLE Turning judgment into compute only compounds if the company can tell whether the compute is getting better. Today that's mostly vibes: an engineer ships a prompt change, a tool change, or a new model and judges it by feel — "seems better," "the demo passed." Vibes don't survive Series B, and they definitely don't survive an agent that's quoting real coverage for real businesses. Your job is to turn agent quality from a vibe into a number. Harper's agents handle intake, sales, service, voice, and submission packaging; every one needs to be evaluated, regression-tested, and monitored in production. You'll work alongside the engineer setting AI-quality direction and own a specific agent surface end-to-end — so that when the agent improves we know, and when it regresses we know before the customer does. That's how we scale judgment without scaling headcount. WHAT YOU'LL DO - Build capability + regression eval suites for your assigned agents — intake, submissions, placements, renewals, CRM, or voice. - Curate golden datasets from real failure modes: real transcripts, real underwriter back-and-forth, real call recordings. 20–50 sharp cases per agent, not thousands of synthetic ones. - Design graders. Deterministic first (string match, state check, tool-call assertions); LLM-as-judge where deterministic fails; human calibration on samples. - Ship pre-merge eval gates. E

Free ATS check

Applying for this Senior Member of Technical Staff, AI Quality role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Harper?

Real rants from real employees. Read before you apply.

Read Company Rants →