ID. me
Technology
StaffSoftwareEngineer-AIAgentEvaluations
Neural analysis suggests this role is
optimal for Staff candidates.
“Staff Software Engineer- AI Agent Evaluations at ID. me. Skills: AI Agent Evaluations, LLM Behavior, Agentic Systems, RAG Pipelines. Define AI quality standards. Own AI agent evaluation framework”
Industry & Context.
Root cause analysis; Troubleshooting; Debugging
What They're Looking For.
Must Have
Bachelor's degree in Computer Science, 8+ years building software systems, Experience evaluating LLM features, Proficiency with AI development tools, Backend engineering fundamentals, Experience designing test infrastructure, Experience improving developer experience, Lead cross-team initiatives, Written and verbal communication
Nice to Have
Background in identity verification, Familiarity with model evaluation, Experience with observability tooling, Track record building developer tooling
What You'll Do.
Define AI quality standards
Own AI agent evaluation framework
Build eval infrastructure
Design evaluation pipelines
Maintain evaluation pipelines
Instrument agentic systems
Detect behavioral drift
Lead test suite design
Handle non-determinism
Construct golden datasets
Build LLM-as-judge pipelines
Perform property-based testing
Build internal tooling
Create feedback loops
Develop testing workflows
Accelerate agent development
Enable fast eval runs
Provide clear regression signals
Develop agent workflows
Develop observability strategy
Implement testing approaches
How You'll Work.
Team & Collaboration
Partner with product teams; Partner with platform teams; Cross-Team Collaboration; Partner with Security; Partner with AI/ML teams
Communication Scope
Written communication; Verbal communication; Engineering communication; Product communication; Leadership communication
Process & Methodology
Technical initiatives
Full Job Description
Company Overview ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit https://network.id.me/. About the Role This Staff Engineer role sits at the intersection of engineering, applied AI, testing and developer experience. You will define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems operating in production. It requires deep engineering rigor, original thinking about what "correctness" means for non-deterministic systems, and the ability to build eval infrastructure and developer tooling that the entire engineering org depends on. Expert in building and maintaining Retrieval-Augmented Generation (RAG) pipelines, with a deep focus on strategic data chunking and data quality enforcement. Experience in establishing pre-retrieval data quality gates to optimize vector search accuracy, minimize retrieval-induced noise, and significantly reduce LLM hallucination rates in production-deployed agent systems. You will establish quality standards for how ID.me ships AI-powered features safely, mentor engineers across teams on AI testing best practi
Applying for this Staff Software Engineer- AI Agent Evaluations role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about ID. me?
Real rants from real employees. Read before you apply.