Priceline

Tech / AI / Software

SiteReliabilityEngineer,Observability

$110–110k toronto, ontario, canada FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Site Reliability Engineer, Observability at Priceline. Skills: Observability, SRE, DevOps, platform engineering, OpenTelemetry, Kubernetes, Splunk, New Relic, Grafana. Support and evolve end-to-end observability solutions. Administer and operate core observability platforms”

What You'll Achieve.

improve detection, diagnosis, and overall system reliability; enabling faster root cause analysis that directly impacts MTTR and MTTD; improve signal quality, incident triage, and operational efficiency

Industry & Context.

Tech / AI / Software

Problems you'll solve

diagnosis; faster root cause analysis; troubleshooting ingestion issues

Eligibility Requirements

Two days in-office

What They're Looking For.

Must Have

3+ years of experience in Observability, SRE, DevOps, or platform engineering roles supporting production systems, understanding of APM and SRE fundamentals, including MELT (Metrics, Events, Logs, Traces), latency analysis, error rate monitoring, service dependency mapping, SLIs/SLOs, alert tuning, and root cause analysis, Hands-on experience administering at least one modern observability/APM platform (e. g. , Splunk, New Relic, Grafana), with practical exposure to metrics, logs, distributed tracing, and platform configuration, Experience supporting full-stack observability coverage across infrastructure, application, browser monitoring layers, Experience building dashboards and actionable alerts, including configuring alert workflows and integrations with incident management tools such as PagerDuty, Experience implementing or supporting OpenTelemetry-based instrumentation and improving telemetry quality across services, Familiarity with Kubernetes and cloud-native environments - an understanding of how applications are deployed, monitored, and scaled, Experience managing telemetry pipelines and agents (e. g. , collectors, forwarders, sidecars), including onboarding services and troubleshooting ingestion issues, Working knowledge of scripting or automation (e. g. , Shell, Python) and CI/CD concepts, Comfortable collaborating with engineering teams to improve monitoring standards, instrumentation quality, and overall production visibility, Demonstrated history of living the values important to Priceline: Customer, Innovation, Team, Accountability and Trust, high standard of ethics, honesty, transparency and compliance

Nice to Have

Experience or familiarity with infrastructure-as-code tools such as Terraform for managing platform configurations and integrations is a plus, Relevant certifications such as New Relic APM Practitioner, Reliability Engineer – Professional, Splunk Admin, or GCP Associate Cloud Engineer are a plus

What You'll Do.

Support and evolve end-to-end observability solutions

Administer and operate core observability platforms

Contribute to building and advancing a modern OpenTelemetry-based observability ecosystem

Improve and standardize instrumentation practices across services

Optimize telemetry pipelines for performance

Ensure observability platform reliability

and performance meet defined SLAs and operational standards

How You'll Work.

Team & Collaboration

Partner with product and platform engineering teams to enhance production visibility; Collaborating with engineering teams to improve monitoring standards, instrumentation quality, and overall production visibility

Full Job Description

This role is eligible for our hybrid work model: Two days in-office. This job posting is for an existing, currently vacant position. **Site Reliability Engineer, Observability** Our Technology team is the backbone of our company: constantly creating, testing, learning and iterating to better meet the needs of our customers. If you thrive in a fast-paced, ideas-led environment, you’re in the right place. **Why this job’s a big deal:** As Priceline continues to scale globally, reliable production visibility is critical to delivering seamless customer experiences. We are investing in strengthening our observability foundations to improve detection, diagnosis, and overall system reliability. This role plays a key part in maturing our observability capabilities—standardizing instrumentation, improving telemetry quality, and enabling faster root cause analysis that directly impacts MTTR and MTTD. **In this role you will get to:** * Support and evolve end-to-end observability solutions for collecting, shipping, storing, and querying OpenTelemetry signals (metrics, logs, and traces) across infrastructure, containers, and Kubernetes environments. Administer and operate core observability platforms (Splunk, New Relic, ClickHouse, Grafana, Lightrun), including service onboarding, access management, configuration, upgrades, and ongoing platform health. * Contribute to building and advancing a modern OpenTelemetry-based observability ecosystem that supports multiple telemetry types at scale. * Improve and standardize instrumentation practices across services, driving consistent logging, metrics, and distributed tracing implementation. * Partner with product and platform engineering teams to enhance production visibility and support SLO-driven reliability practices. * Optimize telemetry pipelines for performance, data quality, scalability, and cost efficiency. * Help define and support governance standards for observability, ensuring consistency, reliability, and scalability acro

Free ATS check

Applying for this Site Reliability Engineer, Observability role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 40 detected · ranked by frequency

Splunk ×4

New Relic ×4

Grafana ×4

OpenTelemetry ×3

Kubernetes ×3

collecting telemetry signals ×3

shipping telemetry signals ×3

storing telemetry signals ×3

querying telemetry signals ×3

metrics ×3

logs ×3

traces ×3

instrumentation ×3

telemetry quality ×3

root cause analysis ×3

dashboards ×3

actionable alerts ×3

alert workflows ×3

scripting ×3

automation ×3

Observability ×2

SRE ×2

DevOps ×2

platform engineering ×2

ClickHouse ×2

Lightrun ×2

Terraform ×2

Shell

Python

APM fundamentals

SRE fundamentals

SLO-driven reliability practices

BEHAVIOURAL

collaborationcustomer focusinnovationteamworkaccountabilitytrustethicshonestytransparencycompliance

Role Details

Seniority mid

Experience 3–5 yrs

Level Mid

Work Mode Hybrid

Type FULL TIME

Education Bachelor’s degree in Computer Science or equivalent practica

Salary Band 100k-150k

AI-Extracted Insights

Domain Areas

melt-metricseventslogstracesslis-sloscloud-native-environments

Certifications

New Relic APM PractitionerReliability Engineer – ProfessionalSplunk AdminGCP Associate Cloud Engineer

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Priceline?

Real rants from real employees. Read before you apply.

Read Company Rants →