Onebrief
AI-powered workflow software
SeniorSiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior Site Reliability Engineer at Onebrief. Skills: Site Reliability Engineering, Infrastructure as Code, Containers and orchestration, Observability, Incident Response. Own the reliability, scalability, and security of the production application and/or platform. Design, implement, and manage monitoring, logging, and alerting stack”
What You'll Achieve.
Ensure best-in-class service quality and issue resolution; Increase stability, performance, and security of deployments; Improve the overall experience of deploying and managing Onebrief on premise; Increase trust internally and externally; Make 'fast recovery' a reality
Industry & Context.
Diving into a kubectl shell to triage a complex production issue; Translate constraints and failure modes into clear, automated guardrails and scalable, resilient architecture; Identify and resolve issues before they impact users; Identify true root causes; Drive automated, long-term solutions to prevent recurrence; Proactively identify and eliminate operational toil
Regularly working on-site at customer locations in Colorado Springs, CO, Willingness to relocate if not currently within commuting distance (relocation assistance provided), Active Top Secret Clearance required, Ability to obtain SCI eligibility, Work in both on-premise DoD environments and AWS cloud environments
What They're Looking For.
Must Have
Active Top Secret Clearance, 5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus, Proficiency with at least one of Python, Go, or Bash, Networking fundamentals: core protocols and secure configurations
Nice to Have
Experience in DoD environments and compliance frameworks (RMF, STIGs, ICD 503), GitOps practices and toolchains, Security‑minded design for sensitive environments, Experience designing and implementing meaningful SLIs/SLOs (including error budgets) for complex, distributed systems, Familiarity with on‑prem virtualization(VMware, Proxmox, Nutanix, Hyper-V, etc), Service mesh exposure (Istio, Linkerd), Relevant certifications (e. g. , AWS DevOps Engineer, CKA/CKAD), Active Security+ or another DoD 8570. 01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment
What You'll Do.
and security of the production application and/or platform
and manage monitoring
Create actionable insights and automated alerting
and own alerting that feeds into SLIs and SLOs
Act as incident responder and potentially incident commander during critical incidents
Lead blameless post-mortems / After Action Reviews (AARs)
Partner with platform engineers to design
resilient Kubernetes clusters and cloud/on-prem environments
Embed security and compliance controls directly into automation
Proactively identify and eliminate operational toil by building automation
Partner with other teams to share best practices for air-gapped environments
Support readiness for production
How You'll Work.
Team & Collaboration
Work closely with fellow SREs, security, and customer success; Partner with platform engineers; Collaborate with application and platform teams; Partner with other teams to share best practices
Communication Scope
Translate constraints and failure modes into clear, automated guardrails and scalable, resilient architecture; Share context openly
Full Job Description
ABOUT ONEBRIEF Onebrief is collaboration and AI-powered workflow software designed specifically for military staffs. By transforming this work, Onebrief makes the staff as a whole superhuman - meaning faster, smarter, and more efficient. We take ownership, seek excellence, and play to win with the seriousness and camaraderie of an Olympic team. Onebrief operates as an all-remote company, though many of our employees work alongside our customers at military commands around the world. Founded in 2019 by a group of experienced planners, today, Onebrief’s team spans veterans from all forces and global organizations, and technologists from leading-edge software companies. We’ve raised $320m+ from top-tier investors, including Battery Ventures, General Catalyst, Sapphire Ventures, Insight Partners, and Human Capital, and today, Onebrief is valued at $2.15B. With this continued growth, Onebrief is able to make an impact where it matters most. SECURITY CLEARANCE, LOCATION, AND ONSITE NOTICE: This role requires regularly working on-site at customer locations in Colorado Springs, CO. If you are not currently within commuting distance, you must be willing to relocate (note that Onebrief will provide relocation assistance). Active Top Secret Clearance required with the ability to obtain SCI eligibility. ABOUT THE ROLE We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. You’ll work closely with fellow SREs, security, and customer success. You will be the first line of support for our mission critical deployments, and responsible for ensuring best-in-class service quality and issue resolution. You will work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works, from policy to implementation. In addition to working at the customer, you will contribute directly to solutions that increase stability, performance, and security of our deployments, and improve the overall experience of
Applying for this Senior Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Onebrief?
Real rants from real employees. Read before you apply.