CGM
healthcare
Reliability&IncidentManager
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Reliability & Incident Manager at CGM. Skills: Reliability Engineering, Incident Management, DevOps, observability, telemetry, root cause analysis. end-to-end experience of customers in support and service. analyze friction points”
What You'll Achieve.
continuous reduction in incident recurrence by at least 1 percentage point per month; measurable cycle times below 14 days from identification to prevention closure; reducing Change Failure Rate to below 5%; increasing overall telemetry coverage to at least 80%
Industry & Context.
root cause mindset; move from symptoms to systemic issues; analyze friction points; identify and eliminate root causes; root cause validation; systemic issues are removed rather than repeatedly mitigated
What They're Looking For.
Must Have
Several years of experience in Reliability Engineering, Incident Management, DevOps, or similar technical operations roles in complex software environments, root cause mindset with the ability to consistently move from symptoms to systemic issues, Proven experience in establishing and operating structured incident, postmortem, or SRE frameworks, Solid understanding of observability, telemetry, correlation IDs, and modern monitoring architectures, Ability to operate effectively with Engineering and Product teams, influencing technical priorities through data and structured reasoning, Experience in defining and enforcing operational standards, processes, and governance models across teams or products, Familiarity with ITSM and delivery tools such as ServiceNow, Jira, or comparable platforms
Nice to Have
Experience with AI in eHealth
What You'll Do.
end-to-end experience of customers in support and service
analyze friction points
drive improvements based on data
analyze recurring incidents
identify and eliminate root causes
Lead and facilitate the RQIL Review Board
ensure measurable cycle times below 14 days from identification to prevention closure
Ensure that every incident resolution is translated into durable prevention artifacts
Partner closely with Product and Engineering to define hardening initiatives
support data driven trade offs between reliability
and roadmap priorities
Identify and close telemetry gaps
Drive structured postmortems for critical incidents
Develop and maintain preventive playbooks for recurring risk patterns
How You'll Work.
Team & Collaboration
Partner closely with Product and Engineering teams; operate effectively with Engineering and Product teams
Process & Methodology
incident prioritization, root cause validation, enforcement of corrective actions, defined ownership, defined actions
Full Job Description
As a leading provider of software solutions for healthcare, we operate in 19 countries and employ nearly 9,000 dedicated professionals. You will work in a dynamic and innovative environment, filled with exciting opportunities. With your commitment and passion, you’ll have the chance to make a lasting impact. **CGM Leverages AI:** We are looking for people who are inspired by the power of AI in eHealth, eager to shape transformation, and curious at heart - ready to see how technology can make healthcare smarter, easier, and better. Together, we are shaping the future of healthcare. Become part of our mission and make a difference - **for a world where knowledge saves lives!** In this role, you are responsible for the end-to-end experience of our customers in support and service, from the first contact to the final resolution. You don’t just look at individual touchpoints, but at the entire journey: intake, routing, processing, resolution, and feedback. The ambition behind this is clear: our customers should not experience internal complexity – they should experience service. That’s why you systematically analyze friction points, drive improvements based on data, and deliberately leverage tools, standards, and automation to measurably increase customer satisfaction. **Your contribution:** * Analyze recurring incidents, support drivers, and release defects with a strong focus on identifying and eliminating root causes rather than symptoms, directly contributing to a continuous reduction in incident recurrence by at least 1 percentage point per month * Lead and facilitate the RQIL Review Board as the central operational mechanism for incident prioritization, root cause validation, and enforcement of corrective actions, ensuring measurable cycle times below 14 days from identification to prevention closure * Ensure that every incident resolution is translated into durable prevention artifacts such as test coverage, release gates, telemetry improvements, runbooks, or engi
Applying for this Reliability & Incident Manager role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about CGM?
Real rants from real employees. Read before you apply.