Mindrift
AI
FreelanceAgentEvaluationEngineer
“Freelance Agent Evaluation Engineer at Mindrift. Skills: Python, software development, test automation, AI agent evaluation. Create challenging tasks for AI coding agents. Define evaluation criteria for AI coding agents”
What You'll Achieve.
Build a dataset to evaluate AI coding agents; Ensure tasks are solvable; Ensure evaluation is fair; Verify tests catch real problems; Verify tests don't miss bad solutions; Verify tests don't break on good ones
Industry & Context.
reasoning about code across the stack; deeply understand where models fail; design scenarios that reveal the difference between a good and a bad solution; Writing tests that accept all correct solutions and reject incorrect ones
CV in English, Indicate English proficiency level
What They're Looking For.
Must Have
5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations), Background in full-stack development, experience building React-based interfaces (JavaScript/TypeScript), robust back-end systems, Experience writing tests (functional, integration — not just running them), Docker containers, familiarity with infrastructure tools (Postgres, Kafka, Redis), CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results), English proficiency - B2
What You'll Do.
Create challenging tasks for AI coding agents
Define evaluation criteria for AI coding agents
Build virtual companies (codebase
context) in simulated environments
Assemble and calibrate tasks from intermediate states of virtual companies
Design tasks in isolated environments (emulations of a developer's workstation)
Write tests for AI-generated code
Iterate with AI agents on tests
Review code written by AI agents
Analyze AI agent performance
Design edge cases and adversarial scenarios
Iterate based on feedback from expert QA reviewers
How You'll Work.
Team & Collaboration
Iterate with an AI agent on tests; Iterate based on feedback from expert QA reviewers
Communication Scope
English proficiency - B2
Applying for this Freelance Agent Evaluation Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Mindrift?
Real rants from real employees. Read before you apply.