Company
Technology
CloudReliability&RecoveryEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Cloud Reliability & Recovery Engineer. Skills: Cloud reliability, Disaster recovery, AWS architecture, Automation. Design resilient cloud architectures. Implement resilient cloud architectures”
Industry & Context.
On-call rotations
What They're Looking For.
Must Have
5+ years of experience in cloud infrastructure, 3+ years of hands-on AWS production experience, Proven experience designing and implementing multi-region DR architectures, Expertise in AWS services including EC2, RDS, S3, DynamoDB, Aurora, Hands-on experience with Kubernetes-based deployments, Scripting skills in Python, Bash, or PowerShell, Experience with Infrastructure as Code tools, Solid understanding of networking concepts, Knowledge of CI/CD pipelines and automation frameworks
Nice to Have
AWS expertise, Deep cloud reliability experience, Proven ability to design and operate large-scale disaster recovery systems, Defined RTO/RPO expertise, Familiarity with related resilience tools, Cloud-native architecture scripting skills, Experience with Terraform or AWS CloudFormation, Knowledge of VPC, DNS failover, VPN, and Direct Connect
What You'll Do.
Design resilient cloud architectures
Implement resilient cloud architectures
Maintain resilient cloud architectures
Focus on disaster recovery
Focus on business continuity
Focus on system availability
Design multi-region AWS architectures
Design multi-AZ AWS architectures
Align with RTO/RPO targets
Build failover mechanisms
Maintain failover mechanisms
Build failback mechanisms
Maintain failback mechanisms
Develop automated DR runbooks
Implement backup strategies
Implement recovery strategies
Automate backup policies
Automate replication workflows
Automate recovery validation
Perform chaos engineering
Perform resilience testing
Manage Infrastructure as Code
Develop CI/CD automation
Build observability dashboards
Build incident response workflows
Participate in on-call rotations
Conduct post-incident reviews
Maintain DR documentation
Maintain compliance artifacts
Maintain audit-ready recovery evidence
How You'll Work.
Team & Collaboration
Global collaboration; Highly skilled engineering teams; Security teams
Communication Scope
Clear technical reports; Executive reports
Full Job Description
## Accountabilities Design, implement, and maintain highly resilient cloud architectures with a strong focus on disaster recovery, business continuity, and system availability. Responsibilities include: Designing multi-region and multi-AZ AWS architectures aligned with defined RTO/RPO targets Building and maintaining failover and failback mechanisms using Route 53, Global Accelerator, and CloudFront Developing automated disaster recovery runbooks using AWS Systems Manager, Step Functions, and related services Implementing backup and recovery strategies across AWS services including EC2, RDS, S3, DynamoDB, and Aurora Automating backup policies, replication workflows, and recovery validation processes Performing chaos engineering and resilience testing using AWS Fault Injection Simulator Managing Infrastructure as Code using Terraform and/or CloudFormation for DR environments Developing CI/CD-driven automation for failover, deployment, and recovery workflows Building observability dashboards, alerts, and incident response workflows using CloudWatch and third-party tools Participating in on-call rotations, incident response, and post-incident reviews Maintaining DR documentation, compliance artifacts, and audit-ready recovery evidence Requirements: The ideal candidate brings strong AWS expertise, deep cloud reliability experience, and a proven ability to design and operate large-scale disaster recovery systems. 5+ years of experience in cloud infrastructure, SRE, or disaster recovery engineering roles 3+ years of hands-on AWS production experience at scale Proven experience designing and implementing multi-region DR architectures with defined RTO/RPO Strong expertise in AWS services including EC2, RDS, S3, DynamoDB, Aurora, and related resilience tools Hands-on experience with Kubernetes-based deployments and cloud-native architecture Strong scripting skills in Python, Bash, or PowerShell for automation and orchestration Experience with Infrastructure as Code tools suc
Applying for this Cloud Reliability & Recovery Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Lever
- Lever uses a streamlined one-page form — apply in under 5 minutes.
- LinkedIn import works well; review parsed data before submitting.
- The cover letter field is optional but visible to reviewers — use it to differentiate.
- Referral codes from employees can significantly boost visibility of your application.
ANONYMOUS · UNFILTERED
What do employees actually say about this company?
Real rants from real employees. Read before you apply.