Mindrift

AI

FreelanceAgentEvaluationEngineer

Córdoba, Córdoba Province, Argentina PART TIME Remote Friendly
The Brief

“Freelance Agent Evaluation Engineer at Mindrift. Skills: Python, Software development, Test automation, AI evaluation. Create challenging tasks. Define evaluation criteria”

What You'll Achieve.

Improve AI systems; Evaluate AI coding agents; Verify tests catch real problems; Ensure tests don't miss bad solutions; Ensure tests don't break on good solutions

Industry & Context.

AI
Problems you'll solve

Analyze why an agent failed or succeeded; Deeply understand where models fail

What They're Looking For.

Must Have

5+ years in software development, Python, FastAPI, pytest, async/await, subprocess, file operations, full-stack development, React-based interfaces, JavaScript/TypeScript, robust back-end systems, writing tests, Docker containers, CI/CD understanding, English proficiency - B2

Nice to Have

infrastructure tools, Postgres, Kafka, Redis, GitHub Actions

What You'll Do.

Create challenging tasks

Define evaluation criteria

Build virtual companies

Assemble and calibrate tasks

Ensure task solvability

Design tasks in isolated environments

Iterate with AI agent on tests

Review code written by agents

Analyze agent failures/successes

Design adversarial scenarios

Iterate based on feedback

How You'll Work.

Team & Collaboration

Work with expert QA reviewers

Communication Scope

English proficiency

Free ATS check

Applying for this Freelance Agent Evaluation Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Mindrift?

Real rants from real employees. Read before you apply.

Read Company Rants →