Us

AI

Director,AIOperations

Masnou, Spain FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Director candidates.

The Brief

“Director, AI Operations at Us. Skills: AI Operations, SRE, Observability, Automation. Lead and evolve operations model. Enforce operational readiness gates”

What You'll Achieve.

increase first-line resolution; eliminate manual toil; free engineering time for higher-value work; operational readiness gates; automation coverage; toil budget compliance; materially reduces operational toil; accelerates patient impact

Industry & Context.

AI
Problems you'll solve

resolve root causes; translate findings into preventative engineering

What They're Looking For.

Must Have

BSc/MSc/PhD in Computer Science or a related analytical field, Demonstrable recent experience building and leading large-scale SRE or platform operations functions hands-on, Deep technical knowledge of platforms such as Datadog, New Relic, Grafana, or Splunk, covering dashboard development, alerting strategies, and telemetry pipeline architecture, Solid understanding of OpenTelemetry, distributed tracing, and structured logging, Consistent track record of designing and implementing automation that materially reduces operational toil, grasp of Azure and/or AWS, including container orchestration, serverless architectures, and managed services, Proven ability to run post-mortem processes and translate findings into preventative engineering, alongside experience defining platform handover criteria, Capability to provide precise technical direction on system instrumentation, alert triggers, and automation interventions while developing high-performing teams

Nice to Have

Application of AI/ML to operational challenges, Experience operating platforms serving AI/ML workloads such as LLM inference, model serving, and data pipelines, Familiarity with regulated pharmaceutical environments or the AstraZeneca technology estate, ITIL, SRE, or operational excellence certifications

What You'll Do.

Lead and evolve operations model

Enforce operational readiness gates

Establish instrumentation and alerting standards

Guide architecture of AI-augmented tooling

Mandate incident yields runbook or automation

Define acceptable toil thresholds

Build talent pipeline

Partner with product engineering

Communicate platform health

How You'll Work.

Team & Collaboration

Partner with product engineering to co-own post-mortems; Negotiate handovers with product engineering; Embed operational requirements into architecture decisions; Communicate platform health, risks, and investments clearly to senior leadership

Communication Scope

Communicate platform health, risks, and investments clearly to senior leadership using data-driven narratives

Full Job Description

## **Introduction to the Role** Transform AI into a true force multiplier for enterprise operations! This role advises how machine learning and artificial intelligence platforms are run, automated, and improved across Azure and AWS to support critical scientific and business outcomes. The focus is on defining system architecture, monitoring, and automation, ensuring operations itself becomes a benchmark for AI adoption. Managing a layered operations framework—spanning L1 runbook operators, L2 site reliability engineers (SREs), and an L3 product engineering interface—this position establishes continuous improvement. The goal is to eliminate manual toil, increase first-line resolution, and free engineering time for higher-value work. Every incident must become a detailed procedure, an automated process, or a permanent fix. This is not standard support; it is a leadership role for an engineering-minded operator setting precise technical direction! ## **Accountabilities** * **Operational Model Ownership** : Lead and evolve the three-tier operations model for the AI/ML platform estate. Enforce operational readiness gates and run monthly reviews using clear metrics: L1 resolution rate, repeat incident rate, automation coverage, and toil budget compliance. * **Technical Direction** : Establish instrumentation and alerting standards for a centralised observability layer (Datadog, New Relic, Grafana, Splunk). Guide the architecture of AI-augmented operations tooling, including conversational runbooks, and direct L2 SRE patch contributions to resolve root causes. * **Automation Strategy** : Mandate that every incident yields a runbook, an automation, or a patch. Define acceptable toil thresholds and prioritise automation investments by incident frequency, resolution time, and blast radius. * **Team Leadership and Development** : Direct the L2 SRE team aligned to cloud domains. Build a robust talent pipeline from L1 to L2, fostering a culture where SREs operate as engineers de

Free ATS check

Applying for this Director, AI Operations role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Us?

Real rants from real employees. Read before you apply.

Read Company Rants →