GDIT
IT Infrastructure and Operations
SiteReliabilityEngineeringLeadModelServing
“Site Reliability Engineering Lead - Model Serving at GDIT. Skills: Site Reliability Engineering, Model Serving, AI, machine learning, Kubernetes, DevSecOps. Owns production reliability strategy for artificial intelligence and machine learning model serving across Advana enclaves supporting Department of Defense missions, Joint Staff analysts, Combatant Command elements, and Senior Executive Service leadership.. Defines service‑level objectives, alerting philosophy, operational runbooks, and rele”
What You'll Achieve.
Maintain operational stability, mission assurance posture, and cross‑domain readiness.; Advance operational readiness, reduce mission risk, and reinforce deployment consistency across all enclaves.; Ensure rapid triage, operational continuity, and sustained mission performance.
Industry & Context.
highly motivated critical thinking
Top Secret Clearance, SCI eligibility, US Citizenship Required, Onsite
What They're Looking For.
Must Have
BS, 8+ years of experience developing reliability strategy, AI and machine learning experience, CompTia Security+, TS with SCI eligibility
What You'll Do.
Owns production reliability strategy for artificial intelligence and machine learning model serving across Advana enclaves supporting Department of Defense missions
Combatant Command elements
and Senior Executive Service leadership.
Defines service‑level objectives
and release safety patterns governing production deployment of model artifacts across multiple security domains.
Establishes reliability governance across serving surfaces by developing operational standards
and incident response patterns aligned with enterprise DevSecOps practices.
Implements reliability engineering methodologies using Kubernetes
GitLab Continuous Integration
and hardened deployment pipelines to maintain operational stability
mission assurance posture
and cross‑domain readiness.
Develops automated reliability checks integrated into deployment workflows to validate performance
and operational suitability of production‑ready models.
Leads coordination with Platform One
multi‑national engineering teams
and cross‑service mission partners to align reliability strategy with evolving architectures
security requirements
and mission priorities.
Produces mission‑critical deliverables including service‑level objective documentation
alerting configurations
reliability scorecards
incident post‑action reports
and release safety assessments.
Strengthens program value by advancing operational readiness
reducing mission risk
and reinforcing deployment consistency across all enclaves.
Supports Tier‑4 incident response actions by maintaining authoritative reliability artifacts required for rapid triage
operational continuity
and sustained mission performance.
How You'll Work.
Team & Collaboration
Leads coordination with Platform One, Cloud One, multi‑national engineering teams, and cross‑service mission partners to align reliability strategy with evolving architectures, security requirements, and mission priorities.
Applying for this Site Reliability Engineering Lead - Model Serving role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about GDIT?
Real rants from real employees. Read before you apply.