Gramian Consulting Group

IT professional services

AIEvaluationEngineer

Remote CONTRACT Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“AI Evaluation Engineer at Gramian Consulting Group. Skills: AI evaluation, Benchmarking, Debugging, Automation. Design benchmark tasks. Create debugging scenarios”

Industry & Context.

IT professional services
Problems you'll solve

Analytical skills; Systems reasoning skills; Multi-step problem-solving scenarios

What They're Looking For.

Must Have

3–10 years of experience in software engineering or related technical domains, debugging skills, analytical skills, systems reasoning skills, Experience with terminal, CLI, automation, or developer tooling workflows

Nice to Have

Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks

What You'll Do.

Design benchmark tasks

Create debugging scenarios

Develop task specifications

Write solution approaches

Design reasoning challenges

Review benchmark quality

Refine validation logic

Collaborate with reviewers

How You'll Work.

Team & Collaboration

Collaborate with reviewers and researchers on AI evaluation workflows

Full Job Description

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs. **Role Overview** We are looking for highly analytical **engineers** and **technical domain experts** to contribute to advanced AI evaluation and benchmarking projects focused on realistic _terminal-based and infrastructure-heavy workflows_. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios. The ideal candidate has strong experience working with **production systems** , **debugging** , **automation** , or **large-scale engineering workflows** , and can design realistic technical challenges that simulate real-world engineering environments. _**This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering.**_ **CONTRACT:** Contractor assignment (5 weeks) **COMMITMENT:** Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap **LOCATION:** Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam **PROCESS:** One technical assessment/interview (~45 min) **Responsibilities:** * Design realistic terminal-based benchmark tasks for AI evaluation systems * Create technically deep debugging and investigation scenarios * Develop task specifications involving infrastructure, workflows, pipelines, or operational failures * Write clear solution approaches and deterministic evaluation criteria * Identify realistic edge cases, failure modes, and system constraints * Design multi-step reasoning challenges across complex technical environments * Contribute exper

Free ATS check

Applying for this AI Evaluation Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Gramian Consulting Group?

Real rants from real employees. Read before you apply.

Read Company Rants →