Gramian Consulting Group

Computer Software

AIEvaluationEngineer(DataAnalysis&Multi-AgentSystems)

Remote CONTRACT Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“AI Evaluation Engineer (Data Analysis & Multi-Agent Systems) at Gramian Consulting Group. Skills: AI Evaluation, Data Analysis, Multi-Agent Systems, Python, SQL, Docker, Statistics. Design and develop multi-agent benchmark tasks focused on complex data analysis workflows. Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data)”

What You'll Achieve.

produce clear, verifiable conclusions; validate precise analytical outputs

Industry & Context.

Computer Software

Problems you'll solve

Ability to design analytical problems with clear, verifiable answers

Eligibility Requirements

8 hours per day with an overlap of 4 hours with PST, take home assessment (60min)

What They're Looking For.

Must Have

5+ years of experience in data analysis or analytics-heavy roles, proficiency in Python (pandas, NumPy) and SQL, Experience working with real-world, messy datasets (CSV, JSON, logs, reports), Ability to design analytical problems with clear, verifiable answers, Solid understanding of statistics (distributions, correlations, outliers), Familiarity with AI benchmarks or evaluation environments (e.g., SWE-bench or similar), Hands-on experience with Docker (Dockerfiles, image builds, debugging)

Nice to Have

Experience in financial analysis, operations analytics, or risk analysis, Exposure to data pipelines or ETL workflows, Experience with data quality validation or anomaly detection systems, Familiarity with AI/ML data workflows or evaluation frameworks

What You'll Do.

Design and develop multi-agent benchmark tasks focused on complex data analysis workflows

Create or curate realistic datasets (CSV

financial or operational data)

Build tasks requiring: Cross-referencing across multiple data sources

Anomaly detection and contradiction identification

Statistical analysis and interpretation

Define task decomposition strategies across specialized sub-agents (e.g.

operational analysis)

Develop verification logic to validate precise analytical outputs (not generic summaries)

Implement evaluation pipelines using Python and SQL

Create reproducible environments using Docker

Analyze task performance and refine for clarity

Full Job Description

**About Us** Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs. **Role overview** We are looking for an **AI Evaluation Engineer specialized in data analysis** to design benchmark tasks that simulate real-world analytical workflows. You will create scenarios where AI systems must analyze **large, messy, multi-source datasets** , decompose tasks across multiple agents, and produce clear, verifiable conclusions. **Commitments Required: 8 hours per day with an overlap of 4 hours with PST.** **Employment type: Contractor assignment (no medical/paid leave)** **Duration of contract: 4 weeks+** **Location:** **Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam** **Interview: take home assessment (60min)** ### **Responsibilities** * Design and develop **multi-agent benchmark tasks** focused on complex data analysis workflows * Create or curate **realistic datasets** (CSV, JSON, logs, reports, financial or operational data) * Build tasks requiring: * Cross-referencing across multiple data sources * Anomaly detection and contradiction identification * Statistical analysis and interpretation * Define **task decomposition strategies** across specialized sub-agents (e.g., financial, technical, operational analysis) * Develop **verification logic** to validate precise analytical outputs (not generic summaries) * Implement evaluation pipelines using **Python and SQL** * Create reproducible environments using **Docker** * Analyze task performance and refine for **clarity, difficulty, and scoring accuracy** **Requirements** * 5+ years of experience in **data analysis or analytics-heavy roles** * Strong proficiency in **Python (pandas, NumPy)** and **SQL** * Experience working with **real-world, messy d

Free ATS check

Applying for this AI Evaluation Engineer (Data Analysis & Multi-Agent Systems) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 32 detected · ranked by frequency

Python ×7

SQL ×7

Docker ×7

Data Analysis ×6

Statistics ×5

pandas ×5

NumPy ×5

anomaly detection ×4

data quality validation ×4

AI Evaluation ×2

Multi-Agent Systems ×2

evaluation pipelines ×2

CSV

JSON

AI benchmarks

evaluation environments

analytics-heavy roles

multi-agent benchmark tasks

complex data analysis workflows

realistic datasets

cross-referencing across multiple data sources

contradiction identification

statistical analysis and interpretation

task decomposition strategies

financial analysis

technical analysis

operational analysis

verification logic

analytical outputs

task performance analysis

clarity, difficulty, and scoring accuracy

AI/ML data workflows

BEHAVIOURAL

claritydifficultyscoring accuracy

Role Details

Seniority mid

Experience 5–5 yrs

Level Mid

Work Mode Remote

Type CONTRACT

Category computer-software

AI-Extracted Insights

Domain Areas

financial-analysisoperations-analyticsrisk-analysis

ANONYMOUS · UNFILTERED

What do employees actually say about Gramian Consulting Group?

Real rants from real employees. Read before you apply.

Read Company Rants →