Mindrift
AI
FreelanceAgentEvaluationEngineer
“Freelance Agent Evaluation Engineer at Mindrift. Skills: Python, Software development, Test automation, AI evaluation. Create challenging tasks. Define evaluation criteria”
What You'll Achieve.
Improve AI systems; Evaluate AI coding agents; Verify tests catch real problems; Ensure tests don't miss bad solutions; Ensure tests don't break on good solutions
Industry & Context.
Analyze why an agent failed or succeeded; Deeply understand where models fail
What They're Looking For.
Must Have
5+ years in software development, Python, FastAPI, pytest, async/await, subprocess, file operations, full-stack development, React-based interfaces, JavaScript/TypeScript, robust back-end systems, writing tests, Docker containers, CI/CD understanding, English proficiency - B2
Nice to Have
infrastructure tools, Postgres, Kafka, Redis, GitHub Actions
What You'll Do.
Create challenging tasks
Define evaluation criteria
Build virtual companies
Assemble and calibrate tasks
Ensure task solvability
Design tasks in isolated environments
Iterate with AI agent on tests
Review code written by agents
Analyze agent failures/successes
Design adversarial scenarios
Iterate based on feedback
How You'll Work.
Team & Collaboration
Work with expert QA reviewers
Communication Scope
English proficiency
Applying for this Freelance Agent Evaluation Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Mindrift?
Real rants from real employees. Read before you apply.