Mastercard

LeadSiteReliabilityEngineer

₹35–55L ~AI est. Pune, India FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Lead Site Reliability Engineer at Mastercard. Skills: Site Reliability Engineering, Cloud Engineering, Kubernetes, Hybrid Cloud. Define SLIs, SLOs, and error budgets. Implement SLIs, SLOs, and error budgets”

What You'll Achieve.

Improved cross-cloud resiliency; Improved DR posture; Reduced hybrid networking incidents; Improved SLO compliance; Reduced MTTR; Increased automation coverage; Reduced change failure rate

Industry & Context.

Problems you'll solve

Troubleshooting; Root cause analysis

Eligibility Requirements

Rotational on-call shifts, Weekends and off-hours support

What They're Looking For.

Must Have

7–10+ years in SRE/DevOps/Cloud Engineering, Deep hands-on experience in AWS and Azure, Deep hands-on experience in hybrid networking, Deep hands-on experience in Kubernetes (cloud & on-prem), Knowledge of Linux internals, Knowledge of TCP/IP, DNS, Load Balancing, Knowledge of TLS/PKI and certificate lifecycle, Knowledge of Distributed systems architecture, Scripting/programming skills (Python preferred), Experience designing cross-cloud DR and failover models, Experience with infrastructure as code, Experience with GitOps

Nice to Have

AWS Solutions Architect certification, Azure Architect certification, Azure DevOps Engineer certification, Certified Kubernetes Administrator (CKA)

What You'll Do.

Architect high-availability designs

Eliminate single points of failure

Conduct resilience validation

Perform chaos testing

Model failure scenarios

Engineer and operate workloads across AWS

Engineer and operate workloads across Azure

Design cross-cloud networking

Implement workload portability

Implement cloud-agnostic deployment strategies

Optimize cost across providers

Optimize performance across providers

Optimize reliability across providers

Design cloud-native autoscaling

Design load balancing strategies

Design traffic routing strategies

Integrate on-prem infrastructure with cloud

Integrate Active Directory / IAM federation

Integrate hybrid DNS architecture

Integrate secure certificate lifecycle management

Troubleshoot hybrid connectivity issues

Manage hybrid Kubernetes deployments

Manage private registry integrations

Support legacy-to-cloud modernization

Architect Kubernetes clusters

Operate Kubernetes clusters

Optimize cluster autoscaling

Optimize resource allocation

Optimize cluster performance

Implement cluster security hardening

Implement RBAC governance

Troubleshoot CNI issues

Troubleshoot ingress controller issues

Troubleshoot service mesh issues

Troubleshoot pod networking issues

Implement GitOps-driven deployments

Build unified observability

Implement centralized logging

Design distributed tracing

Engineer proactive alerting

Engineer resilient CI/CD pipelines

Implement infrastructure as code

Automate certificate rotation

Automate auto-scaling policies

Automate patch orchestration

Automate drift detection

Improve deployment reliability

Lead technical investigation of DNS failures

Lead technical investigation of TLS/PKI failures

Lead technical investigation of network latency

Lead technical investigation of memory leaks

Lead technical investigation of kernel-level issues

Lead technical investigation of thread contention

Lead technical investigation of CPU throttling

Perform packet-level debugging

Analyze distributed system failures

How You'll Work.

Team & Collaboration

Cross-cloud failover patterns; Follow-the-sun model

Process & Methodology

GitOps

Full Job Description

**Our Purpose** _Mastercard powers economies and empowers people in 200 + countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential._ **Title and Summary** ### Lead Site Reliability Engineer ### Role Overview We are seeking a highly technical Lead Site Reliability Engineer (SRE) to architect, engineer, and operate highly reliable, scalable, and secure platforms across multi-cloud (AWS, Azure) and hybrid (on-prem + cloud) environments. This is a deeply hands-on engineering role requiring expertise in distributed systems, Kubernetes, hybrid networking, automation, CI/CD, observability, and production incident leadership. The Lead SRE will serve as the technical authority for reliability across interconnected cloud and datacenter ecosystems. Core Responsibilities 1\. Reliability Engineering Across Hybrid & Multi-Cloud • Define and implement SLIs, SLOs, and error budgets across cloud-native and on-prem workloads. • Architect high-availability designs spanning: o AWS and Azure regions o On-prem datacenters o Cross-cloud failover patterns • Design DR strategies (RTO/RPO driven) across hybrid environments. • Eliminate single points of failure across network, compute, storage, and DNS layers. • Conduct resilience validation, chaos testing, and failure scenario modeling. 2\. Multi-Cloud Architecture & Engineering • Engineer and operate workloads across: o Amazon Web Services o Microsoft Azure • Design cross-cloud networking (VPN, ExpressRoute, Direct Connect, Transit Gateway). • Implement workload portability and cloud-agnostic deployment strategies. • Optimize cost, performance, and reliability acros

Free ATS check

Applying for this Lead Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 99 detected · ranked by frequency

CI/CD ×4

Kubernetes ×3

SLIs/SLOs ×3

Error budgets ×3

High-availability designs ×3

Cross-cloud failover ×3

RTO/RPO ×3

Resilience validation ×3

Chaos testing ×3

Failure modeling ×3

Cloud-native autoscaling ×3

Traffic routing ×3

Active Directory federation ×3

IAM federation ×3

Hybrid DNS ×3

Certificate lifecycle ×3

Hybrid connectivity ×3

BGP routing ×3

Firewall policies ×3

NAT ×3

MTU mismatches ×3

Kubernetes deployments ×3

Private registry ×3

Cluster performance ×3

Unified observability ×3

Centralized logging ×3

Proactive alerting ×3

Signal quality ×3

Infrastructure as code ×3

Auto-scaling policies ×3

Deployment reliability ×3

DNS resolution ×3

BEHAVIOURAL

Leadership

Role Details

Seniority mid

Experience 8–15 yrs

Level Lead

Work Mode Remote

Type FULL TIME

Salary Band 200k+

AI-Extracted Insights

Domain Areas

distributed-systems-architecturehybrid-cloud-environmentscloud-native-workloadson-prem-workloadsmulti-cloud-environmentshybrid-networkingkubernetescontainer-platform

Certifications

AWS Solutions Architect (Associate/Professional)Azure Architect / DevOps EngineerCertified Kubernetes Administrator (CKA)

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Mastercard?

Real rants from real employees. Read before you apply.

Read Company Rants →