Company

Technology

SeniorSiteReliability

€85–125k ~AI est. Dublin, Ireland FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Site Reliability. Skills: Site Reliability Engineering, Cloud infrastructure, Incident management, Automation. Own production service reliability. Improve production service reliability”

What You'll Achieve.

Drive best-in-class service availability

Industry & Context.

Technology
Problems you'll solve

Analytical skills; Problem-solving skills; Diagnose complex production issues

Eligibility Requirements

On-call rotations, Multiple time zones

What They're Looking For.

Must Have

Bachelor’s degree in Computer Science, experience with cloud-native concepts, Google Cloud Platform (GCP) experience, Kubernetes (GKE) experience, Site Reliability Engineering experience, production incident management experience, monitoring and observability tools experience, reliability testing exposure, resilience engineering exposure, cost optimisation initiatives exposure, Excellent analytical skills, Excellent problem-solving skills, Software development experience, automation experience using Python, automation experience using shell scripts, production cloud infrastructure experience at scale, multi-region production systems experience, high-availability production systems experience, scalability focus, resilience focus, minimising service disruption focus

Nice to Have

ServiceNow experience preferred, Splunk Observability experience preferred, OpenTelemetry experience preferred

What You'll Do.

Own production service reliability

Improve production service reliability

Own production service availability

Improve production service availability

Own production service performance

Improve production service performance

Participate in incident management

Use incident workflows

Improve incident workflows

Improve incident tooling

Design observability solutions

Implement observability solutions

Operate observability solutions

Reduce operational toil

Introduce engineering-led solutions

Drive engineering-led solutions

Introduce SRE best practices

Drive SRE best practices

Support on-call rotations

Monitor error budgets

Drive best-in-class service availability

Be accountable for service availability

How You'll Work.

Team & Collaboration

Connect with people; Connect with teams

Full Job Description

## Responsibilities Own and improve the reliability, availability, and performance of production services in Google Cloud (GCP). Participate in incident management, including detection, triage, mitigation, escalation, and recovery. Use and improve incident workflows and tooling (e.g., ServiceNow) to ensure clear ownership and timely communication. Design, implement, and operate observability solutions including monitoring, logging, tracing, synthetics, and dashboards (e.g., Splunk Observability, OpenTelemetry). Reduce operational toil through automation and engineering-led solutions, proactively introducing and driving SRE best practices. Support on-call rotations across multiple time zones, contributing to a sustainable 24/7 support model. Define, monitor, and report SLIs, SLOs, and error budgets for critical services. Drive and be accountable for best-in-class service availability through SRE principles, automation, and proactive reliability engineering. ## Essential skills and/or Certifications Bachelor’s degree in Computer Science, Information Technology or related field Strong experience with cloud-native concepts and technologies, with a strong preference for Google Cloud Platform (GCP) and Kubernetes (GKE). Proven experience with Site Reliability Engineering and production incident management, ideally using platforms such as ServiceNow. Experience with monitoring and observability tools, including metrics, logs, traces, and synthetics (e.g., Splunk Observability, OpenTelemetry). Exposure to reliability testing, resilience engineering, or cost optimisation initiatives. Excellent analytical and problem-solving skills, with the ability to diagnose complex production issues quickly. Software development or automation experience using Python, shell scripts, or similar languages. Hands-on experience operating production cloud infrastructure at scale. Experience managing multi-region, high-availability production systems with a focus on scalability, resilience, and

Free ATS check

Applying for this Senior Site Reliability role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Lever

  • Lever uses a streamlined one-page form — apply in under 5 minutes.
  • LinkedIn import works well; review parsed data before submitting.
  • The cover letter field is optional but visible to reviewers — use it to differentiate.
  • Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →