Amazon UK Services Ltd.
Fulfillment Operations Management, Ops Engineering, fulfillment ops
ReliabilityEngineer,GlobalReliabilityIntelligencePrograms
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Reliability Engineer, Global Reliability Intelligence Programs at Amazon UK Services Ltd.. Skills: Reliability Engineering, Root Cause Analysis, Failure Modes Analysis, Data Analysis. Lead Root Cause Analysis. Develop Failure Modes and Effects Analysis”
What You'll Achieve.
Eliminate true causes of failures; Prevent failures from happening again; Drive measurable improvements in uptime; Drive measurable improvements in performance; Identify risks early; Build smarter systems; Build more reliable systems
Industry & Context.
Root cause analysis; Failure modes analysis; Data analysis; Troubleshooting; Diagnostics; Problem solving
Up to 50% travel
What They're Looking For.
Must Have
Bachelor's degree, Advanced Microsoft Excel, Data scripting languages, BI analytics tools, Large-scale data mining, Data for root cause analysis, Predictive and preventative maintenance, DevOps, Serverless, Software development and design, CI/CD, AI/ML, Storage, Networking, Databases, Infrastructure automation, Agile development, Software architecture/patterns, Modern cloud services, Written and verbal communication
Nice to Have
API design, Cloud architecture/deployment, Service-oriented architecture, Mobile development, Performance optimization, Databases design, Data modeling, Data pipeline design, Industry tools and scripting languages, Full software development lifecycle, Architecture and design, Software development, Automation, Version control tools, Network troubleshooting tools, System architecture, Scalability, Reliability, Performance in database environment, Research methodologies, Machine learning algorithms, Business-critical patterns, New metrics development, Improve business tools and processes
What You'll Do.
Lead Root Cause Analysis
Develop Failure Modes and Effects Analysis
Maintain Failure Modes and Effects Analysis
Improve Failure Modes and Effects Analysis
Analyze equipment and operational data
Identify systemic issues
Identify performance gaps
Translate findings into improvements
Maintain BI dashboards
Build automated reports
Maintain automated reports
Build performance metrics
Maintain performance metrics
Lead cross-functional execution
Partner with operations
Partner with engineering
Partner with maintenance
Partner with external vendors
Drive development of RCA/FMEA tools
Drive enhancement of RCA/FMEA tools
Work with DevOps teams
Work with technical teams
Test RCA/FMEA software
Collect user feedback
Establish reliability best practices
Standardize reliability best practices
Support policy creation
Support organizational adoption
Refine tools and systems
Improve tools and systems
Analyze failure trends
Identify recurring issues
Identify systemic gaps
Identify opportunities to improve reliability
Identify opportunities to improve performance
Support FMEA initiatives
Help teams identify risks
Help teams implement mitigation
Review high-impact events
Review completed RCAs
Ensure actionable outcomes
Collaborate with engineers
Collaborate with operators
Collaborate with vendors
Align corrective actions
Drive execution of corrective actions
Strengthen organizational learning
Strengthen failure prevention
How You'll Work.
Team & Collaboration
Cross-functional teams; DevOps teams; Technical teams; Operations teams; Engineering teams; Maintenance teams; External vendors; Across regions
Communication Scope
Present complex technical information; Clear and concise communication
Process & Methodology
Agile development
Full Job Description
A Reliability Engineer focused on RCA and FMEA hunts down the true causes of failures and eliminates them before they happen again. They lead high-impact investigations, turn data into clear actions, and drive measurable improvements in uptime and performance. This role also gets ahead of problems by identifying risks early through FMEA and building smarter, more reliable systems. If you enjoy solving complex problems, influencing decisions, and delivering real results at scale, this is where you do it. This role may require up to 50% travel. Key job responsibilities • Lead Root Cause Analysis (RCA) for high-impact and recurring failures, driving deep-dive investigations to identify true root causes and ensure effective, lasting corrective actions • Develop, maintain, and continuously improve Failure Modes and Effects Analysis (FMEA) to proactively identify risks, prioritize mitigation, and prevent future failures • Analyze equipment and operational data to identify trends, systemic issues, and performance gaps, translating findings into actionable reliability improvements • Build and maintain BI dashboards, automated reports, and performance metrics (e.g., uptime, MTBF, failure rates) to enable data-driven decision-making • Lead cross-functional execution of reliability improvements by partnering with operations, engineering, maintenance, and external vendors across multiple sites and regions • Drive development and enhancement of RCA/FMEA tools and software by working closely with DevOps and technical teams, including requirements gathering, testing, and user feedback • Establish and standardize reliability best practices, while supporting policy creation, training, and organizational adoption of RCA and FMEA methodologies A day in the life In this role, you will partner closely with DevOps teams to refine and improve tools and systems that support RCA and FMEA at scale. You will analyze failure trends to identify recurring issues, systemic gaps, and opportunities
Applying for this Reliability Engineer, Global Reliability Intelligence Programs role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon UK Services Ltd.?
Real rants from real employees. Read before you apply.