Mindrift

Tech / AI / Software

FreelanceAgentEvaluationEngineer

$0–0k pretoria, gauteng, south africa PART TIME Remote Friendly
The Brief

“Freelance Agent Evaluation Engineer at Mindrift. Skills: Software development, Test automation, AI agent evaluation, Python development, Full-stack development, Writing tests. Create challenging tasks for AI coding agents. Define evaluation criteria for AI coding agents”

Industry & Context.

Tech / AI / Software
Problems you'll solve

Reasoning about code across the stack; Understanding where models fail; Designing tasks that challenge frontier models

What They're Looking For.

Must Have

Degree in Computer Science, Software Engineering, or related fields, 5+ years in software development, Primarily Python (FastAPI, pytest, async/await, subprocess, file operations), Background in full-stack development, Experience building React-based interfaces (JavaScript/TypeScript), Experience building robust back-end systems, Experience writing tests (functional, integration), Docker containers, Familiarity with infrastructure tools (Postgres, Kafka, Redis), CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results), English proficiency - B2

Nice to Have

Expert in every item is not required, but comfort reading and reasoning about code across the stack is expected.

What You'll Do.

Create challenging tasks for AI coding agents

Define evaluation criteria for AI coding agents

Build virtual companies with codebase

Assemble and calibrate tasks from intermediate states of virtual companies

Craft prompts for tasks

Design tasks set in isolated environments (emulations of a developer's workstation)

Write tests that accept all correct solutions and reject incorrect ones

Iterate with an AI agent on tests

Review code written by agents

Analyze why an agent failed or succeeded

Design edge cases and adversarial scenarios

Iterate based on feedback from expert QA reviewers

How You'll Work.

Team & Collaboration

Iterate based on feedback from expert QA reviewers

Communication Scope

English proficiency - B2

Free ATS check

Applying for this Freelance Agent Evaluation Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Mindrift?

Real rants from real employees. Read before you apply.

Read Company Rants →