Mindrift

AI

FreelanceAgentEvaluationEngineer

Belo Horizonte, Minas Gerais, Brazil PART TIME Remote Friendly
The Brief

“Freelance Agent Evaluation Engineer at Mindrift. Skills: Python, software development, test automation, AI agent evaluation. Create challenging tasks for AI coding agents. Define evaluation criteria for AI coding agents”

What You'll Achieve.

Build a dataset to evaluate AI coding agents; Ensure tasks are solvable; Ensure evaluation is fair; Verify tests catch real problems; Verify tests don't miss bad solutions; Verify tests don't break on good ones

Industry & Context.

AI
Problems you'll solve

reasoning about code across the stack; deeply understand where models fail; design scenarios that reveal the difference between a good and a bad solution; Writing tests that accept all correct solutions and reject incorrect ones

Eligibility Requirements

CV in English, Indicate English proficiency level

What They're Looking For.

Must Have

5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations), Background in full-stack development, experience building React-based interfaces (JavaScript/TypeScript), robust back-end systems, Experience writing tests (functional, integration — not just running them), Docker containers, familiarity with infrastructure tools (Postgres, Kafka, Redis), CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results), English proficiency - B2

What You'll Do.

Create challenging tasks for AI coding agents

Define evaluation criteria for AI coding agents

Build virtual companies (codebase

context) in simulated environments

Assemble and calibrate tasks from intermediate states of virtual companies

Design tasks in isolated environments (emulations of a developer's workstation)

Write tests for AI-generated code

Iterate with AI agents on tests

Review code written by AI agents

Analyze AI agent performance

Design edge cases and adversarial scenarios

Iterate based on feedback from expert QA reviewers

How You'll Work.

Team & Collaboration

Iterate with an AI agent on tests; Iterate based on feedback from expert QA reviewers

Communication Scope

English proficiency - B2

Free ATS check

Applying for this Freelance Agent Evaluation Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Mindrift?

Real rants from real employees. Read before you apply.

Read Company Rants →