Amazon Data Services, Inc.

Data Centers

Sr.InfrastructureReliabilityEngineer,InfrastructureReliability&Quality

$60–185k Herndon, Virginia, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality at Amazon Data Services, Inc.. Skills: Infrastructure Reliability, Root Cause Analysis, Risk Assessment, System Reliability Modeling. Drive reliability risk identification. Drive reliability risk assessment”

What You'll Achieve.

Improve datacenter availability

Industry & Context.

Data Centers
Problems you'll solve

Problem analysis; Problem solving; Root cause analysis; Troubleshooting

Eligibility Requirements

Travel within US and internationally

What They're Looking For.

Must Have

10+ years Reliability Engineering, 5+ years root cause analysis, Experience using Physics-of-Failure based approach, Experience with analytical and empirical approaches, Experience with lifecycle environmental and operational stress driven risk analysis, Experience evaluating product design quality/reliability risks, Experience assessing electronics manufacture process related quality/reliability issues, Knowledge of statistical techniques and models, Experience with process capability for electronic component production, Experience establishing critical to quality and reliability metrics, Experience developing datacenter system level reliability model, Experience with reliability block diagram, Experience with statistical modeling, Experience with data analytics, Experience monitoring product performance in field, Experience driving corrective and preventive actions, Experience driving vendor auditing, Experience driving quarterly review process, Proven track record in product reliability leadership, Proven track record in business negotiations, Proven track record in program management, Skill-set in problem analysis and solving, Skill-set in vendor management, Ability to travel within US and internationally

Nice to Have

Professional Engineering or Architectural License, Knowledge of building codes and regulations, Experience carrying design concepts through exploration, development, and into deployment or mass production, Experience reading, interpreting, and creating construction drawings, specifications, and submittal documents, Bachelor's degree in Electrical or Mechanical Engineering, Engineering Technology, Reliability Engineering, or 10+ years of managing, analyzing and communicating results to senior leadership experience, Master's or Ph. D. in Reliability Engineering, Physics, Electrical, Mechanical or Materials Engineering or a related field, Experience with proactive and effective reliability approaches in a cost-effective manner throughout product design, manufacture and deployment stages, Proven experience in working with external design and manufacturing supply chain partners, Familiarity with major data center infrastructure equipment reliability performance, Ability in managing multiple qualification activities and development schedules

What You'll Do.

Drive reliability risk identification

Drive reliability risk assessment

Drive reliability risk mitigation

Perform root cause analysis of critical equipment failures

Drive continuous improvements to improve datacenter availability

Work closely with internal and outside partners

Drive key aspects of product specification

Drive risk identification plan and execution

Develop and implement analytical approaches

Develop and implement empirical approaches

Carry out lifecycle environmental stress driven risk analysis

Carry out lifecycle operational stress driven risk analysis

Identify overstress and fatigue-related product weaknesses

Evaluate product design quality/reliability risks

Assess electronics manufacture process related quality/reliability issues

Drive critical component identification

Establish critical to quality and reliability metrics

Develop datacenter system level reliability model

Perform related reliability quantification

Perform related risk analysis

Monitor product performance in the field

Drive root cause analysis of critical failures

Drive associated corrective actions

Drive associated preventive actions

Drive effective vendor auditing

Drive quarterly review process

Drive continuous improvements of datacenter availability

How You'll Work.

Team & Collaboration

Work with suppliers; Open collaborative environment; Work with internal partners; Work with outside partners; Collaborate with people across AWS

Communication Scope

Communicate results to senior leadership

Process & Methodology

Program management

Full Job Description

As an Infrastructure Reliability Engineer you will be proactively driving the reliability risk identification, assessment and mitigation for datacenter infrastructure equipment (Example: LV Generator, MV Transformers, LV SWGR, Breakers, UPS, HV Transformers). You will also be responsible for root cause analysis of critical equipment failures and drive the continuous improvements to improve datacenter availability for AWS customers. You will work closely with both internal and outside partners including suppliers to drive key aspects of product specification, risk identification plan and execution. You must be ownership minded, independent, action and results oriented to succeed in an open collaborative environment. The candidate should have experience in using Physics-of-Failure based approach to develop and implement both analytical and empirical approaches for product quality/reliability risk identification and assessment during product design, manufacture as well as deployment stages. The individual should be able to drive AWS application-specific requirements in carrying out both lifecycle environmental and operational stress driven risk analysis, including thermal, electrical, chemical and mechanical stresses so to identify overstress and fatigue-related product weaknesses. Candidate should be capable of evaluating not only product design quality/reliability risks, but also have the skills and experiences in assessing electronics manufacture process related quality/reliability issues. Knowledge of statistical techniques and models is required to analyze test as well as field data. At the component level, the individual will drive critical component identification and the associated vendor selection and qualification requirements. The candidate will be expected to use knowledge of process capability for electronic component production as well as system-level performance requirements to establish critical to quality and reliability metrics. At the system level, t

Free ATS check

Applying for this Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Amazon Data Services, Inc.?

Real rants from real employees. Read before you apply.

Read Company Rants →