Company

AI startup

StaffEngineer-AgenticAI

$160–250k San Francisco, California, United States FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Staff Engineer - Agentic AI. Skills: Agentic AI, LLM application architectures, Evaluation and benchmarking. Lead development of core agent intelligence layer. Execute multi-step workflows”

What You'll Achieve.

Drive agent task success rate; Ensure commercial viability; Improve completion metrics

Industry & Context.

AI startup

Problems you'll solve

Failure handling; Error recovery; Troubleshooting

What They're Looking For.

Must Have

7+ years software engineering, 2+ years building agentic LLM agents, Deep LLM application architectures experience, Agentic systems evaluation/benchmarking experience, Shipped AI systems with measurable outcomes, Python skills, Hands-on LLM tooling experience, Experience leading small technical team

Nice to Have

Desktop automation experience, COM experience, Programmatic control of applications experience, Mechanical engineering background, CAD/CAE background, PLM background, Enterprise deployment constraints familiarity, Published work in agentic AI, Open-source contributions in agentic AI, Experience building public benchmarks for AI agents

What You'll Do.

Lead development of core agent intelligence layer

Execute multi-step workflows

Serve as technical lead

Own full product loop

Define agent capabilities

Build agent implementations

Benchmark agent implementations

Drive agent task success rate

Define eval framework

Improve completion metrics

Set per-task token cost

Ensure commercial viability

Build evaluation infrastructure

Ground evals in user stories

Lead user story mapping

Validate user stories

Conduct direct interviews

Collaborate with domain experts

Translate user stories into evals

Close loop between research and benchmarking

Own agent architecture decisions

Write production code

Raise engineering standards

Collaborate cross-functionally

Align agent behavior with usage

How You'll Work.

Team & Collaboration

Small senior team; Cross-functionally with integrations; Cross-functionally with product; Cross-functionally with customers

Process & Methodology

Setting direction, Driving architecture decisions

Full Job Description

ABOUT THE ROLE A well-funded, early-stage AI startup in the mechanical engineering software space is looking for a Staff Engineer — Agentic AI to own the core agent intelligence layer that turns engineers' intent into reliable, cost-efficient multi-step workflows across complex desktop engineering tools. This is a high-impact, senior technical leadership role reporting directly to the CTO, sitting at the intersection of applied agentic AI, user research, and product delivery. The company serves Fortune 100 hardware engineering customers and is backed by notable investors. You'll join a small, senior team and have a direct line to executive leadership. The role is on-site in San Francisco, CA. WHAT YOU'LL DO - Lead development of the core agent intelligence layer executing multi-step workflows across complex desktop engineering software (CAD, CAE, PLM). - Report to the CTO and serve as technical lead for a small team of AI engineers, a user researcher, and domain expert contractors. - Own the full product loop: define agent capabilities from user stories, build implementations, and benchmark against real workflows. - Drive agent task success rate — define the eval framework, establish baselines, and systematically improve completion metrics. - Set and enforce per-task token budgets; track cost per completed workflow to ensure commercial viability. - Build rigorous, reproducible evaluation infrastructure grounded in validated user stories (SWE-bench-level rigor applied to engineering workflows). - Lead user story mapping and validation through direct interviews and collaboration with domain experts. - Translate validated user stories into testable evals, closing the loop between research and benchmarking. - Own agent architecture decisions: tool-calling strategies, state management, error recovery, model routing, and context management. - Act as a player-coach: write production code, review designs, unblock the team, and raise engineering standards. - Collaborate cros

Free ATS check

Applying for this Staff Engineer - Agentic AI role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 30 detected · ranked by frequency

Agentic AI ×5

Cost efficiency ×4

Multi-step workflows ×3

Failure handling ×3

Cost constraints ×3

Model selection ×3

Context management ×3

Retrieval strategies ×3

Orchestration patterns ×3

Task completion ×3

Failure mode analysis ×3

SWE-bench ×3

GAIA ×3

τ-bench ×3

Tool-calling strategies ×3

State management ×3

Error recovery ×3

Model routing ×3

LLM application architectures ×2

Evaluation and benchmarking ×2

Logfire ×2

LangSmith ×2

LLM

Python

Function calling

Tool use APIs

User research

Product delivery

User story mapping

Benchmarking

BEHAVIOURAL

Leadership

Role Details

Experience 7–10 yrs

Level Senior

Work Mode Onsite

Type FULL TIME

Category engineering

Salary Band 150k-200k

AI-Extracted Insights

Domain Areas

mechanical-engineering-softwaredesktop-engineering-toolscadcaeplmagentic-ai-systemsdesktop-automationenterprise-deployment-constraints

How to Apply on Ashby

Ashby is a fast modern ATS — most applications take under 3 minutes.
The resume parser is strong; verify parsed experience dates and job titles.
Custom screening questions are often scored algorithmically — answer completely.
Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →