Amazon Data Services Ireland Limited
Technology
SystemsDevelopmentEngineer,AWSIncidentResponse(AIR)
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Systems Development Engineer, AWS Incident Response (AIR) at Amazon Data Services Ireland Limited. Skills: Incident Response, Systems Development, Automation Tooling. Drive incident resolution. Lead incident calls”
What You'll Achieve.
Make events shorter; Make events less frequent; Minimize customer impact; Prevent recurrence
Industry & Context.
Root cause analysis; Identify root causes; Assess situations; Troubleshooting
On-call rotation, Weekend on-call, Holiday on-call
What They're Looking For.
Must Have
Systems engineering fundamentals, Networking, storage, operating systems, Designing or architecting systems, Programming with modern language, 5+ years systems engineering
Nice to Have
Automating, deploying, supporting infrastructure, Automation or monitoring frameworks, Analytical skills, Attention to detail, Effective communication abilities, Managing and troubleshooting network, Leading high-severity incident calls, Driving resolution across teams, Authoring event deep-dive documents, Driving action items to closure
What You'll Do.
Drive incident resolution
Coordinate resolver teams
Design automation tools
Build automation tools
Enhance automation tools
Author event deep-dive documents
Identify recurring platform issues
Eliminate operational problems
Expand incident response capabilities
How You'll Work.
Team & Collaboration
Service teams; Global teams; Cross-functional teams
Communication Scope
Event documentation; Action item creation
Process & Methodology
Action item tracking
Full Job Description
AWS Incident Response (AIR) ensures the high availability of Amazon Web Services by making customer-impacting events shorter and less frequent through incident detection, management, and automated mitigation. Our systems monitor AWS infrastructure in real-time, automatically detect impairments, and orchestrate responses to minimize customer impact across regions and services. As a Systems Development Engineer on the AIR team, you will lead the response to critical customer-impacting events — triaging impact, identifying root causes, coordinating mitigation actions with service teams, and driving resolution in real-time. Not every event is solved by automation; you will use your technical judgment to assess situations, engage the right teams, and direct mitigation strategies when manual intervention is required. Insights from these events directly inform the automation and tooling you build — creating a continuous improvement loop where each event makes the next one shorter or prevents it entirely. This role offers a unique combination of systems development and real-time operational leadership, with direct impact on the availability of AWS services used by millions of customers. Key job responsibilities • Drive the resolution of large-scale customer-impacting incidents as part of an on-call rotation (including weekends and holidays), leading incident calls and coordinating resolver teams across AWS service organizations • Design, build, and enhance incident detection, triage, and mitigation automation tools • Author COEs and event deep-dive documents to identify improvement opportunities; create and lead action items that improve processes, tooling, and automation • Identify recurring platform issues and own projects that eliminate entire classes of operational problems • Collaborate with teams globally to expand incident response capabilities across AWS regions and services A day in the life A Systems Development Engineer on the AWS Incident Response (AIR) team has
Applying for this Systems Development Engineer, AWS Incident Response (AIR) role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon Data Services Ireland Limited?
Real rants from real employees. Read before you apply.