CGM

healthcare

Reliability&IncidentManager

koblenz, rheinland-pfalz, germany FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Reliability & Incident Manager at CGM. Skills: Reliability Engineering, Incident Management, DevOps, observability, telemetry, root cause analysis. end-to-end experience of customers in support and service. analyze friction points”

What You'll Achieve.

continuous reduction in incident recurrence by at least 1 percentage point per month; measurable cycle times below 14 days from identification to prevention closure; reducing Change Failure Rate to below 5%; increasing overall telemetry coverage to at least 80%

Industry & Context.

healthcare
Problems you'll solve

root cause mindset; move from symptoms to systemic issues; analyze friction points; identify and eliminate root causes; root cause validation; systemic issues are removed rather than repeatedly mitigated

What They're Looking For.

Must Have

Several years of experience in Reliability Engineering, Incident Management, DevOps, or similar technical operations roles in complex software environments, root cause mindset with the ability to consistently move from symptoms to systemic issues, Proven experience in establishing and operating structured incident, postmortem, or SRE frameworks, Solid understanding of observability, telemetry, correlation IDs, and modern monitoring architectures, Ability to operate effectively with Engineering and Product teams, influencing technical priorities through data and structured reasoning, Experience in defining and enforcing operational standards, processes, and governance models across teams or products, Familiarity with ITSM and delivery tools such as ServiceNow, Jira, or comparable platforms

Nice to Have

Experience with AI in eHealth

What You'll Do.

end-to-end experience of customers in support and service

analyze friction points

drive improvements based on data

analyze recurring incidents

identify and eliminate root causes

Lead and facilitate the RQIL Review Board

ensure measurable cycle times below 14 days from identification to prevention closure

Ensure that every incident resolution is translated into durable prevention artifacts

Partner closely with Product and Engineering to define hardening initiatives

support data driven trade offs between reliability

and roadmap priorities

Identify and close telemetry gaps

Drive structured postmortems for critical incidents

Develop and maintain preventive playbooks for recurring risk patterns

How You'll Work.

Team & Collaboration

Partner closely with Product and Engineering teams; operate effectively with Engineering and Product teams

Process & Methodology

incident prioritization, root cause validation, enforcement of corrective actions, defined ownership, defined actions

Full Job Description

As a leading provider of software solutions for healthcare, we operate in 19 countries and employ nearly 9,000 dedicated professionals. You will work in a dynamic and innovative environment, filled with exciting opportunities. With your commitment and passion, you’ll have the chance to make a lasting impact. **CGM Leverages AI:** We are looking for people who are inspired by the power of AI in eHealth, eager to shape transformation, and curious at heart - ready to see how technology can make healthcare smarter, easier, and better. Together, we are shaping the future of healthcare. Become part of our mission and make a difference - **for a world where knowledge saves lives!** In this role, you are responsible for the end-to-end experience of our customers in support and service, from the first contact to the final resolution. You don’t just look at individual touchpoints, but at the entire journey: intake, routing, processing, resolution, and feedback. The ambition behind this is clear: our customers should not experience internal complexity – they should experience service. That’s why you systematically analyze friction points, drive improvements based on data, and deliberately leverage tools, standards, and automation to measurably increase customer satisfaction. **Your contribution:** * Analyze recurring incidents, support drivers, and release defects with a strong focus on identifying and eliminating root causes rather than symptoms, directly contributing to a continuous reduction in incident recurrence by at least 1 percentage point per month * Lead and facilitate the RQIL Review Board as the central operational mechanism for incident prioritization, root cause validation, and enforcement of corrective actions, ensuring measurable cycle times below 14 days from identification to prevention closure * Ensure that every incident resolution is translated into durable prevention artifacts such as test coverage, release gates, telemetry improvements, runbooks, or engi

Free ATS check

Applying for this Reliability & Incident Manager role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about CGM?

Real rants from real employees. Read before you apply.

Read Company Rants →