Arena Intelligence

ProductManager

Bay Area FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Product Manager at Arena Intelligence. Skills: Product management, AI systems, Evaluation methodologies, Technical product shipping, Systems thinking, Cross-functional leadership, Product judgment, Written communication. Own the roadmap and product strategy for Arena's evaluations and leaderboard platform. Partner closely with ML researchers to translate emerging evaluation methodologies — multimodal evals, agentic workflows, reasoning traces, and new benchmark categories — into production-quali”

What You'll Achieve.

Measure and advance the frontier of AI for real-world use; Build transparent, rigorous, and human-centered model evaluations; Understand real-world reliability, alignment, and impact; Shape the global conversation on model reliability and progress; Define how fast-moving AI research becomes trusted product infrastructure; Translate emerging evaluation methodologies into systems and experiences that scale to millions of users and influence how the broader ecosystem interprets AI performance; Achieve adoption, engagement, citations, frontier-lab participation, and evaluation throughput

Industry & Context.

Problems you'll solve

Systems thinking; Identify bottlenecks, coordination gaps, and scaling constraints; Make decisions with incomplete information; Create structure where little exists

What They're Looking For.

Must Have

5–8 years of product management experience in highly technical or ambiguous environments, familiarity with modern AI systems, including LLMs, multimodal models, agents, reasoning systems, and evaluation methodologies, A track record of shipping technically complex products from concept to production, Experience translating research-heavy or technically ambiguous work into clear product direction and execution, systems thinking — you can identify bottlenecks, coordination gaps, and scaling constraints across technical and organizational systems, Exceptional cross-functional leadership skills. You can align researchers, engineers, and designers without relying on formal authority, High agency and product judgment. You move quickly, make decisions with incomplete information, and create structure where little exists, written communication. You can write specifications for researchers and product narratives for external technical audiences with equal clarity

Nice to Have

Technical background in computer science, machine learning, or related fields, Prior experience in evaluations, benchmarking systems, AI infrastructure, research tooling, or developer platforms, Experience building products for technical audiences such as researchers, ML engineers, or developers, Founder or early-stage startup experience

What You'll Do.

Own the roadmap and product strategy for Arena's evaluations and leaderboard platform

Partner closely with ML researchers to translate emerging evaluation methodologies — multimodal evals

and new benchmark categories — into production-quality product experiences

Define how evaluation research moves from prototype → implementation → launch → ecosystem adoption

Drive cross-functional execution across research

and marketing to close the gap between research artifacts and trusted user-facing infrastructure

Prioritize what gets evaluated next based on frontier model trends

and strategic opportunities

and operational rigor around evaluation quality

and leaderboard credibility

Own product metrics across adoption

frontier-lab participation

and evaluation throughput

Engage directly with frontier labs

and enterprise users to identify where current evaluation systems break down and where the ecosystem is headed next

Help shape how Arena balances evaluation rigor

and speed as the platform scales

How You'll Work.

Team & Collaboration

Partner closely with ML researchers; Drive cross-functional execution across research, engineering, design, and marketing; Align researchers, engineers, and designers without relying on formal authority

Communication Scope

Written communication; Product narratives for external technical audiences

Process & Methodology

Roadmap management, Product strategy, Product execution

Full Job Description

ABOUT ARENA INTELLIGENCE Arena Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use. Millions of people use Arena Intelligence each month to explore how frontier systems perform — and we use our community’s feedback to build transparent, rigorous, and human-centered model evaluations. Leading enterprises and AI labs rely on our evaluations to understand real-world reliability, alignment, and impact. Our leaderboards are the gold standard for AI performance — trusted by leaders across the AI community and shaping the global conversation on model reliability and progress. We’re a team of researchers, engineers, academics, and builders from places like UC Berkeley, Google, Stanford, DeepMind, and Discord. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We’re building a company where thoughtful, curious people from all backgrounds can do their best work. Everyone on our team is a deep expert in their field — our office radiates excellence, energy, and focus. About the Role Arena is hiring a Product Manager to lead our evaluations platform. Evaluations sit at the center of Arena. Our leaderboards and evaluation systems are increasingly used by frontier labs, developers, researchers, and enterprises as signals for model quality, capability, and trust. The core challenge of this role is not traditional roadmap management. It is defining how fast-moving AI research becomes trusted product infrastructure. You will operate at the intersection of ML research, engineering, design, and product execution — translating emerging evaluation methodologies into systems and experiences that scale to millions of users and influence how the broader ecosystem interprets AI performance. This is a high-ownership role in an environment where evaluation methodologies, model capabilities, and

Free ATS check

Applying for this Product Manager role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 24 detected · ranked by frequency

Evaluation methodologies ×3

Systems thinking ×3

Cross-functional leadership ×3

Product judgment ×3

Translating emerging evaluation methodologies into production-quality product experiences ×3

Defining how evaluation research moves from prototype → implementation → launch → ecosystem adoption ×3

Prioritizing what gets evaluated next based on frontier model trends, developer demand, ecosystem gaps, and strategic opportunities ×3

Building systems, workflows, and operational rigor around evaluation quality, release cadence, and leaderboard credibility ×3

Engaging directly with frontier labs, researchers, developers, and enterprise users to identify where current evaluation systems break down and where the ecosystem is headed next ×3

Product management ×2

AI systems ×2

Technical product shipping ×2

Written communication ×2

LLMs

multimodal models

agents

reasoning systems

Product strategy

Product execution

Product metrics

Evaluation rigor

Usability

Neutrality

Speed

BEHAVIOURAL

CuriosityCraftsmanshipCuriosityImpactTransparencyTrustCommunity impactCuriosity

Role Details

Experience 5–8 yrs

Level Senior

Type FULL TIME

Category product

AI-Extracted Insights

Domain Areas

modern-ai-systemsllmsmultimodal-modelsagentsreasoning-systemsevaluation-methodologiesai-model-performanceai-research

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Arena Intelligence?

Real rants from real employees. Read before you apply.

Read Company Rants →