Gramian Consulting Group
IT professional services
AIEvaluationEngineer
Neural analysis suggests this role is
optimal for Mid candidates.
“AI Evaluation Engineer at Gramian Consulting Group. Skills: AI evaluation, Benchmarking, Debugging, Automation. Design benchmark tasks. Create debugging scenarios”
Industry & Context.
Analytical skills; Systems reasoning skills; Multi-step problem-solving scenarios
What They're Looking For.
Must Have
3–10 years of experience in software engineering or related technical domains, debugging skills, analytical skills, systems reasoning skills, Experience with terminal, CLI, automation, or developer tooling workflows
Nice to Have
Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks
What You'll Do.
Design benchmark tasks
Create debugging scenarios
Develop task specifications
Write solution approaches
Design reasoning challenges
Review benchmark quality
Refine validation logic
Collaborate with reviewers
How You'll Work.
Team & Collaboration
Collaborate with reviewers and researchers on AI evaluation workflows
Full Job Description
Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs. **Role Overview** We are looking for highly analytical **engineers** and **technical domain experts** to contribute to advanced AI evaluation and benchmarking projects focused on realistic _terminal-based and infrastructure-heavy workflows_. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios. The ideal candidate has strong experience working with **production systems** , **debugging** , **automation** , or **large-scale engineering workflows** , and can design realistic technical challenges that simulate real-world engineering environments. _**This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering.**_ **CONTRACT:** Contractor assignment (5 weeks) **COMMITMENT:** Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap **LOCATION:** Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam **PROCESS:** One technical assessment/interview (~45 min) **Responsibilities:** * Design realistic terminal-based benchmark tasks for AI evaluation systems * Create technically deep debugging and investigation scenarios * Develop task specifications involving infrastructure, workflows, pipelines, or operational failures * Write clear solution approaches and deterministic evaluation criteria * Identify realistic edge cases, failure modes, and system constraints * Design multi-step reasoning challenges across complex technical environments * Contribute exper
Applying for this AI Evaluation Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Gramian Consulting Group?
Real rants from real employees. Read before you apply.