REAL DEV INC

real estate

SeniorAIEngineer-AISystemsEvaluationTeam

Tel Aviv-Yafo, Tel Aviv District, Israel FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior AI Engineer - AI Systems Evaluation Team at REAL DEV INC. Skills: AI Systems Evaluation, Production Software Engineering, Applied AI, LLMs, Evaluation Architectures, Automated Pipelines, Scoring Systems, Observability, CI/CD. Own the systems that define, measure, and enforce AI quality at REAL. Translate ambiguous model behavior into measurable signals, automated tests, and release gates”

What You'll Achieve.

define, measure, and enforce AI quality; deliver meaningful impact; operate with greater precision and confidence

Industry & Context.

real estate
Problems you'll solve

simplify complexity to deliver meaningful impact; Precision matters in everything we build

What They're Looking For.

Must Have

3-6 years building production software, internal platforms, ML/data infrastructure, experimentation systems, or AI tooling, backend and systems engineering fundamentals with hands-on applied AI experience, Python, production-level systems experience, Built testing frameworks or validation systems end-to-end, Hands-on with LLMs / RAG / agent workflows, Understands eval methods (benchmarking, A, LLM-as-judge, HITL), Experience with observability / logging / experiment tracking, systems thinking (coverage, reliability, reproducibility), Comfort with non-deterministic systems

Nice to Have

Experience with eval, tracing, observability, or experimentation tooling (one or more of the following: LangSmith, Braintrust, Phoenix, MLflow, OpenTelemetry, PostHog, custom eval stacks), Familiarity with dataset/versioning workflows, HITL systems, and production AI observability systems, CI/CD integration for model evaluation, Background in search, retrieval, or document systems, Built internal platforms or developer tools, Experience working in startups and business driven environments

What You'll Do.

Own the systems that define

and enforce AI quality at REAL

Translate ambiguous model behavior into measurable signals

Operate across evaluation design

and production integration

Design evaluation architectures (benchmarks

Build automated pipelines to run and score evals across models and prompts

Implement scoring systems (LLM-as-judge

Create and maintain golden datasets + edge-case suites

Develop internal tools for prompt testing

Instrument systems for traces

Detect regressions and enforce quality gates in CI/CD

Monitor model performance in production

Close the loop between eval insights and product improvements

How You'll Work.

Team & Collaboration

collaborate closely with customers and teammates

Full Job Description

**REAL** is building an AI Execution Platform for real estate organizations. Today, the data required to run real estate is scattered across fragmented systems, leading to missed insights and preventable financial leakage. **REAL** transforms this complexity into connected intelligence and automated execution, enabling enterprises to operate with greater precision and confidence. **REAL** **Values** * **Ownership** : We take responsibility and move decisively. * **Clarity** : We simplify complexity to deliver meaningful impact. * **Accuracy** : Precision matters in everything we build. * **Velocity** : We work with urgency and intent. * **Partnership** : We collaborate closely with customers and teammates. **Role Overview** * Own the systems that define, measure, and enforce AI quality at REAL. * Translate ambiguous model behavior into measurable signals, automated tests, and release gates. * Operate across evaluation design, tooling, and production integration. **What You'll Do** * Design evaluation architectures (benchmarks, regression suites, coverage) * Build automated pipelines to run and score evals across models and prompts * Implement scoring systems (LLM-as-judge, rubrics, hybrid approaches) * Create and maintain golden datasets + edge-case suites * Develop internal tools for prompt testing, dataset generation, experiment tracking * Instrument systems for traces, outputs, and debugging * Detect regressions and enforce quality gates in CI/CD * Monitor model performance in production * Close the loop between eval insights and product improvements **Requirements** **What We're Looking For** * 3-6 years building production software, internal platforms, ML/data infrastructure, experimentation systems, or AI tooling * Strong backend and systems engineering fundamentals with hands-on applied AI experience * Strong Python, production-level systems experience * Built testing frameworks or validation systems end-to-end * Hands-on with LLMs / RAG / agent workflows * U

Free ATS check

Applying for this Senior AI Engineer - AI Systems Evaluation Team role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about REAL DEV INC?

Real rants from real employees. Read before you apply.

Read Company Rants →