Ardent

National Security

TeamLead/ReliabilityEngineer

$155–210k ~AI est. Ashburn, Virginia, United States
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Team Lead/Reliability Engineer at Ardent. Skills: Reliability Engineering, Production Monitoring, Incident Management, AWS. Manage team schedules. Assure every shift is manned”

What You'll Achieve.

Ensure optimal service delivery; Improve service reliability; Improve production support processes; Ensure optimization and stability

Industry & Context.

National Security
Problems you'll solve

Root cause analysis; Technical troubleshooting; Problem resolution; Troubleshooting production issues; Diagnostic tools

Eligibility Requirements

On-call 24/7, Open to 2nd or 3rd shift, Government issued background investigation

What They're Looking For.

Must Have

Experience in Production Monitoring & Support, Incident management, Root cause analysis, Problem resolution for cloud-based applications, Hands-on experience with AWS, Experience with cloud-based monitoring tools, Ability to build and implement monitoring solutions, Ability to automate manual processes, Ability to create alerts, Experience with system health monitoring, Troubleshooting production issues, Leadership skills, Effective communication skills, Ability to develop and maintain technical documentation, Ability to maintain knowledge base resources, Experience in triaging production incidents, Experience assessing severity, Experience escalating issues properly

Nice to Have

Active CBPI or Top Secret clearance

What You'll Do.

Manage team schedules

Assure every shift is manned

Assist in emergency situations

Provide proactive notification of issues

Provide early notification of issues

Communicate frequently during incidents

Communicate succinctly post incident

Identify corrective measures

Provide needed metrics

Build monitoring solutions

Build production support solutions

Provide customer visibility

Triage production incidents

Resolve production incidents

Participate in root cause analysis

Participate in postmortem discussions

Assess initial severity

Escalate issues properly

Participate in creation of documentation

Participate in maintenance of documentation

Troubleshoot production issues

Collaborate in developing technical solutions

Restore service to systems

Restore data to systems

Lead implementation of production support activities

Lead technical discussions

Lead design discussions

Help enterprises adopt new technologies

Help enterprises adopt new practices

Perform system health monitoring

Optimize system performance

Define monitoring processes

Establish monitoring processes

Define tooling for monitoring

Establish tooling for monitoring

Perform routine system health checks

Ensure optimization of applications

Ensure stability of applications

Provide training to new staff

Provide refresher training

How You'll Work.

Team & Collaboration

IT teams; Business teams; Infrastructure teams; External customers; Other members of IT; Other members of business; Support staff

Communication Scope

Incident reports; Status updates; Leadership communication; Stakeholder communication; Customer communication; Executive communication

Process & Methodology

Incident management, Problem resolution

Full Job Description

At Ardent, we hire people who want more than a job — they want to serve a mission that matters. Our teams support the federal government’s most critical national security and defense priorities, helping protect the nation, strengthen resilience, and advance the technologies and capabilities that keep America secure. For veterans, cleared professionals, and purpose-driven innovators, Ardent is a place to continue serving alongside a team that understands the importance of the mission and the people behind it. We also know top talent has choices, which is why we back our mission with benefits and flexibility that stand out: competitive pay, comprehensive health coverage, flexible PTO, federal holidays off, tuition reimbursement, professional development support, wellness stipends, and a culture that values and rewards hard work, dedication, and adaptability. If you want to build something meaningful, while enjoying the kind of flexibility and support that you need to do your best work — Ardent is where your next mission begins. Ardent is seeking a Reliability Engineer to join our team. This is an onsite role in Ashburn, VA. Although the main shift is 7:00a. m. to 3:00p.m., candidate must be open to working 2nd or 3rd shift in a 24/7/365 environment. Position Description: We are seeking a skilled Team Lead/Reliability Engineer to support our client's mission by enhancing Production Monitoring and ensuring optimal service delivery for their applications. This role involves proactive issue identification, incident resolution, and system health optimization within a 24x7x365 operational environment. The ideal candidate will lead monitoring solutions, manage other reliability engineers, and collaborate across IT and business teams to improve service reliability. Expertise in AWS environments, root cause analysis, and technical troubleshooting is essential, along with strong communication and leadership skills to drive continuous improvement. Requirements: Experience in Pro

Free ATS check

Applying for this Team Lead/Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Ardent?

Real rants from real employees. Read before you apply.

Read Company Rants →