P-1 AI
AI
MemberofTechnicalStaffEvals
“Member of Technical Staff - Evals at P-1 AI. Skills: evals, software development, AI systems. implement and operate the system for organizing, transforming, running, grading, and reporting on eval benchmarks. design and execute the process by which we develop and QA our evals”
What You'll Achieve.
ensure that Archie is learning and retaining the skills needed to successfully perform its engineering work; benchmark it against industry skill expectations; continuously benchmarking our evolving AI platform and the experiments we’re performing around it
Industry & Context.
quantitative intuition over physical product domains
plan to spend one week per quarter co-working with the rest of the company in our San Mateo office, occasional team travel workshop in between
What They're Looking For.
Must Have
Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others, Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations, Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.), Ability to thrive in a fast-paced, dynamic startup environment
Nice to Have
Experience in developing, managing, and running evals against LLM-based systems is a plus
What You'll Do.
implement and operate the system for organizing
and reporting on eval benchmarks
design and execute the process by which we develop and QA our evals
Ensure that evals run effectively within our CI/CD system
Create methods for detecting and testing for common quality challenges of AI
Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks
How You'll Work.
Team & Collaboration
coordinating the contributions of others; incorporating contributions from our own engineering team, industrial partners, and subject-matter experts; Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers)
Communication Scope
Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers)
Applying for this Member of Technical Staff - Evals role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about P-1 AI?
Real rants from real employees. Read before you apply.