Company

Technology

CloudReliability&RecoveryEngineer

₹25–45L ~AI est. India FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Cloud Reliability & Recovery Engineer. Skills: Cloud reliability, Disaster recovery, AWS architecture, Automation. Design resilient cloud architectures. Implement resilient cloud architectures”

Industry & Context.

Technology
Eligibility Requirements

On-call rotations

What They're Looking For.

Must Have

5+ years of experience in cloud infrastructure, 3+ years of hands-on AWS production experience, Proven experience designing and implementing multi-region DR architectures, Expertise in AWS services including EC2, RDS, S3, DynamoDB, Aurora, Hands-on experience with Kubernetes-based deployments, Scripting skills in Python, Bash, or PowerShell, Experience with Infrastructure as Code tools, Solid understanding of networking concepts, Knowledge of CI/CD pipelines and automation frameworks

Nice to Have

AWS expertise, Deep cloud reliability experience, Proven ability to design and operate large-scale disaster recovery systems, Defined RTO/RPO expertise, Familiarity with related resilience tools, Cloud-native architecture scripting skills, Experience with Terraform or AWS CloudFormation, Knowledge of VPC, DNS failover, VPN, and Direct Connect

What You'll Do.

Design resilient cloud architectures

Implement resilient cloud architectures

Maintain resilient cloud architectures

Focus on disaster recovery

Focus on business continuity

Focus on system availability

Design multi-region AWS architectures

Design multi-AZ AWS architectures

Align with RTO/RPO targets

Build failover mechanisms

Maintain failover mechanisms

Build failback mechanisms

Maintain failback mechanisms

Develop automated DR runbooks

Implement backup strategies

Implement recovery strategies

Automate backup policies

Automate replication workflows

Automate recovery validation

Perform chaos engineering

Perform resilience testing

Manage Infrastructure as Code

Develop CI/CD automation

Build observability dashboards

Build incident response workflows

Participate in on-call rotations

Conduct post-incident reviews

Maintain DR documentation

Maintain compliance artifacts

Maintain audit-ready recovery evidence

How You'll Work.

Team & Collaboration

Global collaboration; Highly skilled engineering teams; Security teams

Communication Scope

Clear technical reports; Executive reports

Full Job Description

## Accountabilities Design, implement, and maintain highly resilient cloud architectures with a strong focus on disaster recovery, business continuity, and system availability. Responsibilities include: Designing multi-region and multi-AZ AWS architectures aligned with defined RTO/RPO targets Building and maintaining failover and failback mechanisms using Route 53, Global Accelerator, and CloudFront Developing automated disaster recovery runbooks using AWS Systems Manager, Step Functions, and related services Implementing backup and recovery strategies across AWS services including EC2, RDS, S3, DynamoDB, and Aurora Automating backup policies, replication workflows, and recovery validation processes Performing chaos engineering and resilience testing using AWS Fault Injection Simulator Managing Infrastructure as Code using Terraform and/or CloudFormation for DR environments Developing CI/CD-driven automation for failover, deployment, and recovery workflows Building observability dashboards, alerts, and incident response workflows using CloudWatch and third-party tools Participating in on-call rotations, incident response, and post-incident reviews Maintaining DR documentation, compliance artifacts, and audit-ready recovery evidence Requirements: The ideal candidate brings strong AWS expertise, deep cloud reliability experience, and a proven ability to design and operate large-scale disaster recovery systems. 5+ years of experience in cloud infrastructure, SRE, or disaster recovery engineering roles 3+ years of hands-on AWS production experience at scale Proven experience designing and implementing multi-region DR architectures with defined RTO/RPO Strong expertise in AWS services including EC2, RDS, S3, DynamoDB, Aurora, and related resilience tools Hands-on experience with Kubernetes-based deployments and cloud-native architecture Strong scripting skills in Python, Bash, or PowerShell for automation and orchestration Experience with Infrastructure as Code tools suc

Free ATS check

Applying for this Cloud Reliability & Recovery Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Lever

  • Lever uses a streamlined one-page form — apply in under 5 minutes.
  • LinkedIn import works well; review parsed data before submitting.
  • The cover letter field is optional but visible to reviewers — use it to differentiate.
  • Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →