Ardent
National Security
TeamLead/ReliabilityEngineer
Neural analysis suggests this role is
optimal for Lead candidates.
“Team Lead/Reliability Engineer at Ardent. Skills: Reliability Engineering, Production Monitoring, Incident Management, AWS. Manage team schedules. Assure every shift is manned”
What You'll Achieve.
Ensure optimal service delivery; Improve service reliability; Improve production support processes; Ensure optimization and stability
Industry & Context.
Root cause analysis; Technical troubleshooting; Problem resolution; Troubleshooting production issues; Diagnostic tools
On-call 24/7, Open to 2nd or 3rd shift, Government issued background investigation
What They're Looking For.
Must Have
Experience in Production Monitoring & Support, Incident management, Root cause analysis, Problem resolution for cloud-based applications, Hands-on experience with AWS, Experience with cloud-based monitoring tools, Ability to build and implement monitoring solutions, Ability to automate manual processes, Ability to create alerts, Experience with system health monitoring, Troubleshooting production issues, Leadership skills, Effective communication skills, Ability to develop and maintain technical documentation, Ability to maintain knowledge base resources, Experience in triaging production incidents, Experience assessing severity, Experience escalating issues properly
Nice to Have
Active CBPI or Top Secret clearance
What You'll Do.
Manage team schedules
Assure every shift is manned
Assist in emergency situations
Provide proactive notification of issues
Provide early notification of issues
Communicate frequently during incidents
Communicate succinctly post incident
Identify corrective measures
Provide needed metrics
Build monitoring solutions
Build production support solutions
Provide customer visibility
Triage production incidents
Resolve production incidents
Participate in root cause analysis
Participate in postmortem discussions
Assess initial severity
Escalate issues properly
Participate in creation of documentation
Participate in maintenance of documentation
Troubleshoot production issues
Collaborate in developing technical solutions
Restore service to systems
Restore data to systems
Lead implementation of production support activities
Lead technical discussions
Lead design discussions
Help enterprises adopt new technologies
Help enterprises adopt new practices
Perform system health monitoring
Optimize system performance
Define monitoring processes
Establish monitoring processes
Define tooling for monitoring
Establish tooling for monitoring
Perform routine system health checks
Ensure optimization of applications
Ensure stability of applications
Provide training to new staff
Provide refresher training
How You'll Work.
Team & Collaboration
IT teams; Business teams; Infrastructure teams; External customers; Other members of IT; Other members of business; Support staff
Communication Scope
Incident reports; Status updates; Leadership communication; Stakeholder communication; Customer communication; Executive communication
Process & Methodology
Incident management, Problem resolution
Full Job Description
At Ardent, we hire people who want more than a job — they want to serve a mission that matters. Our teams support the federal government’s most critical national security and defense priorities, helping protect the nation, strengthen resilience, and advance the technologies and capabilities that keep America secure. For veterans, cleared professionals, and purpose-driven innovators, Ardent is a place to continue serving alongside a team that understands the importance of the mission and the people behind it. We also know top talent has choices, which is why we back our mission with benefits and flexibility that stand out: competitive pay, comprehensive health coverage, flexible PTO, federal holidays off, tuition reimbursement, professional development support, wellness stipends, and a culture that values and rewards hard work, dedication, and adaptability. If you want to build something meaningful, while enjoying the kind of flexibility and support that you need to do your best work — Ardent is where your next mission begins. Ardent is seeking a Reliability Engineer to join our team. This is an onsite role in Ashburn, VA. Although the main shift is 7:00a. m. to 3:00p.m., candidate must be open to working 2nd or 3rd shift in a 24/7/365 environment. Position Description: We are seeking a skilled Team Lead/Reliability Engineer to support our client's mission by enhancing Production Monitoring and ensuring optimal service delivery for their applications. This role involves proactive issue identification, incident resolution, and system health optimization within a 24x7x365 operational environment. The ideal candidate will lead monitoring solutions, manage other reliability engineers, and collaborate across IT and business teams to improve service reliability. Expertise in AWS environments, root cause analysis, and technical troubleshooting is essential, along with strong communication and leadership skills to drive continuous improvement. Requirements: Experience in Pro
Applying for this Team Lead/Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Ardent?
Real rants from real employees. Read before you apply.