GDIT

IT Infrastructure and Operations

SiteReliabilityEngineeringLeadModelServing

$128–173k Washington, District of Columbia, United States FULL TIME
The Brief

“Site Reliability Engineering Lead - Model Serving at GDIT. Skills: Site Reliability Engineering, Model Serving, AI, machine learning, Kubernetes, DevSecOps. Owns production reliability strategy for artificial intelligence and machine learning model serving across Advana enclaves supporting Department of Defense missions, Joint Staff analysts, Combatant Command elements, and Senior Executive Service leadership.. Defines service‑level objectives, alerting philosophy, operational runbooks, and rele”

What You'll Achieve.

Maintain operational stability, mission assurance posture, and cross‑domain readiness.; Advance operational readiness, reduce mission risk, and reinforce deployment consistency across all enclaves.; Ensure rapid triage, operational continuity, and sustained mission performance.

Industry & Context.

IT Infrastructure and Operations
Problems you'll solve

highly motivated critical thinking

Eligibility Requirements

Top Secret Clearance, SCI eligibility, US Citizenship Required, Onsite

What They're Looking For.

Must Have

BS, 8+ years of experience developing reliability strategy, AI and machine learning experience, CompTia Security+, TS with SCI eligibility

What You'll Do.

Owns production reliability strategy for artificial intelligence and machine learning model serving across Advana enclaves supporting Department of Defense missions

Combatant Command elements

and Senior Executive Service leadership.

Defines service‑level objectives

and release safety patterns governing production deployment of model artifacts across multiple security domains.

Establishes reliability governance across serving surfaces by developing operational standards

and incident response patterns aligned with enterprise DevSecOps practices.

Implements reliability engineering methodologies using Kubernetes

GitLab Continuous Integration

and hardened deployment pipelines to maintain operational stability

mission assurance posture

and cross‑domain readiness.

Develops automated reliability checks integrated into deployment workflows to validate performance

and operational suitability of production‑ready models.

Leads coordination with Platform One

multi‑national engineering teams

and cross‑service mission partners to align reliability strategy with evolving architectures

security requirements

and mission priorities.

Produces mission‑critical deliverables including service‑level objective documentation

alerting configurations

reliability scorecards

incident post‑action reports

and release safety assessments.

Strengthens program value by advancing operational readiness

reducing mission risk

and reinforcing deployment consistency across all enclaves.

Supports Tier‑4 incident response actions by maintaining authoritative reliability artifacts required for rapid triage

operational continuity

and sustained mission performance.

How You'll Work.

Team & Collaboration

Leads coordination with Platform One, Cloud One, multi‑national engineering teams, and cross‑service mission partners to align reliability strategy with evolving architectures, security requirements, and mission priorities.

Free ATS check

Applying for this Site Reliability Engineering Lead - Model Serving role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about GDIT?

Real rants from real employees. Read before you apply.

Read Company Rants →