ID. me

Technology

StaffSoftwareEngineer-AIAgentEvaluations

$235–345k ~AI est. Mountain View, California, United States

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Staff candidates.

The Brief

“Staff Software Engineer- AI Agent Evaluations at ID. me. Skills: AI Agent Evaluations, LLM Behavior, Agentic Systems, RAG Pipelines. Define AI quality standards. Own AI agent evaluation framework”

Industry & Context.

Technology

Problems you'll solve

Root cause analysis; Troubleshooting; Debugging

What They're Looking For.

Must Have

Bachelor's degree in Computer Science, 8+ years building software systems, Experience evaluating LLM features, Proficiency with AI development tools, Backend engineering fundamentals, Experience designing test infrastructure, Experience improving developer experience, Lead cross-team initiatives, Written and verbal communication

Nice to Have

Background in identity verification, Familiarity with model evaluation, Experience with observability tooling, Track record building developer tooling

What You'll Do.

Define AI quality standards

Own AI agent evaluation framework

Build eval infrastructure

Design evaluation pipelines

Maintain evaluation pipelines

Instrument agentic systems

Detect behavioral drift

Lead test suite design

Handle non-determinism

Construct golden datasets

Build LLM-as-judge pipelines

Perform property-based testing

Build internal tooling

Create feedback loops

Develop testing workflows

Accelerate agent development

Enable fast eval runs

Provide clear regression signals

Develop agent workflows

Develop observability strategy

Implement testing approaches

How You'll Work.

Team & Collaboration

Partner with product teams; Partner with platform teams; Cross-Team Collaboration; Partner with Security; Partner with AI/ML teams

Communication Scope

Written communication; Verbal communication; Engineering communication; Product communication; Leadership communication

Process & Methodology

Technical initiatives

Full Job Description

Company Overview ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit https://network.id.me/. About the Role This Staff Engineer role sits at the intersection of engineering, applied AI, testing and developer experience. You will define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems operating in production. It requires deep engineering rigor, original thinking about what "correctness" means for non-deterministic systems, and the ability to build eval infrastructure and developer tooling that the entire engineering org depends on. Expert in building and maintaining Retrieval-Augmented Generation (RAG) pipelines, with a deep focus on strategic data chunking and data quality enforcement. Experience in establishing pre-retrieval data quality gates to optimize vector search accuracy, minimize retrieval-induced noise, and significantly reduce LLM hallucination rates in production-deployed agent systems. You will establish quality standards for how ID.me ships AI-powered features safely, mentor engineers across teams on AI testing best practi

Free ATS check

Applying for this Staff Software Engineer- AI Agent Evaluations role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 65 detected · ranked by frequency

Data chunking ×4

LLM-as-judge ×4

Property-based testing ×4

Agentic Systems ×3

RAG ×3

Data quality enforcement ×3

Vector search accuracy ×3

LLM hallucination reduction ×3

Agent testing ×3

LLM evaluation ×3

Agent behavior evaluation ×3

Tool use evaluation ×3

Multi-turn interaction evaluation ×3

Behavioral drift detection ×3

Failure mode analysis ×3

Non-determinism handling ×3

Red-teaming ×3

Golden dataset construction ×3

Developer experience improvement ×3

Inner loop acceleration ×3

AI feature development ×3

Responsible AI deployment ×3

Quality gate implementation ×3

AI system observability ×3

AI Agent Evaluations ×2

LLM Behavior ×2

Agent development ×2

Python

Java

LLM

BEHAVIOURAL

MentorshipLeadership

Role Details

Experience 8–15 yrs

Level Staff

Work Mode Onsite

Category engineering

Salary Band 200k+

AI-Extracted Insights

Domain Areas

digital-identityidentity-verificationagentic-systemsllm-behaviornon-deterministic-systemsrag-pipelinesvector-searchllm-hallucination

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about ID. me?

Real rants from real employees. Read before you apply.

Read Company Rants →