Company
Technology
ProductReliabilityEngineer
Neural analysis suggests this role is
optimal for Mid candidates.
“Product Reliability Engineer. Skills: Reliability engineering, Distributed systems, Kubernetes, Observability. Partner with customers. Investigate production issues”
What You'll Achieve.
Prevent production regressions; Strengthen observability; Strengthen system resilience
Industry & Context.
Root cause analysis; Troubleshooting
Remote-first environments, Shifting priorities
What They're Looking For.
Must Have
4–7 years of experience, Software engineering fundamentals, Hands-on Kubernetes expertise, Experience with observability tools, Proficiency in Python, Go, or Rust, Analytical and communication skills, Experience working in cross-functional environments
Nice to Have
Kubernetes experience a plus, Observability experience a plus, Large-scale distributed systems experience a plus
What You'll Do.
Partner with customers
Investigate production issues
Resolve production issues
Lead root-cause analysis
Collaborate with engineering teams
Implement durable fixes
Build reliability tooling
Maintain reliability tooling
Own test automation frameworks
Improve test automation frameworks
Define performance baselines
Maintain performance baselines
Define regression testing frameworks
Maintain regression testing frameworks
Define reliability gates
Improve installation reliability
Improve upgrade reliability
Improve deployment reliability
Identify failure patterns
Build preventive solutions
Develop internal tools
Develop product enhancements
Establish feedback loop
Improve observability
Improve documentation
How You'll Work.
Team & Collaboration
Cross-functional environments; Engineering teams; Product teams; Customer-facing teams
Communication Scope
Root cause analysis; Actionable recommendations
Full Job Description
## Accountabilities Partner with customers and internal teams to investigate and resolve complex production issues across Kubernetes-based on-prem and hybrid deployments. Lead deep root-cause analysis for escalations, reproduce issues, and collaborate with engineering teams to implement durable fixes. Build and maintain reliability tooling such as diagnostics systems, health checks, support bundles, and environment validation utilities. Own and improve test automation frameworks, focusing on CI stability, reducing flaky tests, and strengthening integration and end-to-end coverage. Define and maintain performance baselines, regression testing frameworks, and reliability gates to prevent production regressions. Improve installation, upgrade, and deployment reliability by identifying recurring failure patterns and building preventive solutions. Develop production-grade internal tools and product enhancements using Python, Go, or Rust to strengthen observability and system resilience. Establish a closed feedback loop from customer issues to engineering improvements in testing, observability, documentation, and defaults. Requirements: 4–7 years of experience in production engineering, SRE, platform engineering, or similar roles focused on reliability and distributed systems. Strong software engineering fundamentals, including debugging, testing, system design, and production-grade coding practices. Hands-on Kubernetes expertise, including troubleshooting workloads, networking, storage, RBAC, and multi-environment deployments. Strong experience with observability tools and techniques, including logs, metrics, and tracing for distributed system debugging. Proficiency in at least one programming language such as Python, Go, or Rust, with experience building internal tools or production systems. Strong analytical and communication skills, with the ability to break down complex incidents into clear root causes and actionable recommendations. Experience working in cross-functi
Applying for this Product Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Lever
- Lever uses a streamlined one-page form — apply in under 5 minutes.
- LinkedIn import works well; review parsed data before submitting.
- The cover letter field is optional but visible to reviewers — use it to differentiate.
- Referral codes from employees can significantly boost visibility of your application.
ANONYMOUS · UNFILTERED
What do employees actually say about this company?
Real rants from real employees. Read before you apply.