Mastercard

Financial Services

LeadEngineer,SiteReliabilityEngineering

singapore FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Lead Engineer, Site Reliability Engineering at Mastercard. Skills: Site Reliability Engineering, infrastructure troubleshooting, observability, monitoring, automation, Infrastructure as Code, incident management, root cause analysis. Lead continuous assessments of the application infrastructure supporting critical Mastercard applications, focusing on health, performance, monitoring and alerting, and capacity analysis. Collaborate with Product and Development teams to forecast growth requirements”

What You'll Achieve.

ensuring the availability, scalability, and resilience of our network; reducing Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM); certify environment readiness before customer traffic is routed to it; strengthen multi-disciplinary SRE team capabilities

Industry & Context.

Financial Services

Problems you'll solve

Excellent infrastructure troubleshooting and analytical problem solving skills; proven ability to triage and investigate complex issues; Demonstrated ability to troubleshoot complex production issues, perform root cause analysis, and drive long term corrective actions; Effective incident management skills with a structured, analytical approach to problem solving

Eligibility Requirements

periodic on-call responsibilities

What They're Looking For.

Must Have

5–10 years of experience in an SRE or SRE related operations role, 3+ years supporting e commerce, financial services, or large scale SaaS platforms, Excellent infrastructure troubleshooting and analytical problem solving skills, hands on experience with observability and monitoring tools such as Splunk, Dynatrace, or equivalent, with a proven ability to triage and investigate complex issues, Familiarity with network telemetry tools such as SolarWinds and NetScout, Proficiency in packet level debugging, including capturing traffic with tools like tcpdump and analyzing packets using Wireshark, Broad understanding of end to end infrastructure supporting payment platforms—spanning platform services, networking, databases, and storage, Experience with automation and Infrastructure as Code tools such as Chef, Ansible, and Terraform, as well as structured data formats (JSON/YAML), Excellent communication skills with the ability to coordinate cross functional troubleshooting efforts and lead RCA processes to closure, Demonstrated ability to troubleshoot complex production issues, perform root cause analysis, and drive long term corrective actions, Experience partnering with development teams to shape architecture, define SLIs/SLOs, and embed reliability into services from design through operation, understanding of monitoring and observability ecosystems, including Prometheus, Grafana, ELK/EFK, Splunk, and OpenTelemetry, Effective incident management skills with a structured, analytical approach to problem solving

Nice to Have

Kubernetes a plus

What You'll Do.

Lead continuous assessments of the application infrastructure supporting critical Mastercard applications

monitoring and alerting

and capacity analysis

Collaborate with Product and Development teams to forecast growth requirements and ensure scalability and resiliency

Champion observability as a core principle for infrastructure services by assessing environments and technologies to uncover gaps in monitoring and alerting

Design and implement strategies to close these gaps

ensuring all infrastructure telemetry is integrated into a unified

single-pane-of-glass view

Build custom dashboards to investigate and perform root cause analysis on complex issues

Lead regular incident reviews with internal support teams to ensure root causes are identified

Develop and implement strategies to remediate or mitigate risks when patterns of failure or compatibility issues between software and infrastructure emerge

Leverage automation and AI technologies to enhance proactive issue detection

enable self-healing capabilities

reducing Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM)

Develop testing and validation plans for new environment builds

disaster recovery exercises and post-maintenance activities to certify environment readiness before customer traffic is routed to it

Champion continuous learning

and knowledge sharing across networking and other infrastructure disciplines to strengthen multi-disciplinary SRE team capabilities

Lead training initiatives for team members and Product and Development on networking aspects of the platforms

Evaluate vendor hardware

and software upgrade roadmaps

and conduct proof-of-concept (POC) testing to identify potential risks and opportunities for improvement in upcoming releases

How You'll Work.

Team & Collaboration

Collaborate with Product and Development teams; coordinate cross functional troubleshooting efforts; partnering with development teams to shape architecture, define SLIs/SLOs, and embed reliability into services from design through operation; knowledge sharing across networking and other infrastructure disciplines; Lead training initiatives for team members and Product and Development

Communication Scope

Excellent communication skills; ability to coordinate cross functional troubleshooting efforts; lead RCA processes to closure

Process & Methodology

Lead continuous assessments, Lead regular incident reviews, Lead training initiatives, Develop testing and validation plans, conduct proof-of-concept (POC) testing

Full Job Description

**Our Purpose** _Mastercard powers economies and empowers people in 200 + countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential._ **Title and Summary** ### Lead Engineer, Site Reliability Engineering ### Lead Engineer, Site Reliability Engineering Our Purpose: Mastercard powers economies and empowers people across more than 200 countries and territories worldwide. We are committed to building an inclusive, digital economy that benefits everyone, everywhere—by making transactions safe, simple, smart, and accessible. Through secure data, trusted networks, strong partnerships, and relentless innovation, we help individuals, financial institutions, governments, and businesses unlock their greatest potential. About the Role: Mastercard’s Program aligned Site Reliability Engineering (SRE) teams are dedicated to delivering a seamless experience for our customers. We achieve this by maintaining every aspect of our Programs infrastructure and technology ecosystem to the highest standards, ensuring compliance with rigorous security requirements. Within Mastercard, SRE focuses on the reliability and performance of core infrastructure, networks, and foundational services that power our applications. Our mission is to ensure these components operate with excellence, enabling applications to deliver an outstanding customer experience. In this role, you will join our Payments Network SRE team and take ownership of continuously assessing and elevating the end to end service quality of our platform. You will leverage data to drive root cause analysis and deliver strategic insights to key stakeholders on r

Free ATS check

Applying for this Lead Engineer, Site Reliability Engineering role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 39 detected · ranked by frequency

infrastructure troubleshooting ×5

automation ×5

Infrastructure as Code ×5

Site Reliability Engineering ×3

observability ×3

incident management ×3

root cause analysis ×3

packet level debugging ×3

structured data formats ×3

testing ×3

validation ×3

monitoring ×2

Splunk ×2

Dynatrace ×2

SolarWinds ×2

NetScout ×2

tcpdump ×2

Wireshark ×2

Chef ×2

Ansible ×2

Terraform ×2

Prometheus ×2

Grafana ×2

ELK/EFK ×2

OpenTelemetry ×2

JSON

YAML

Payments Network

e commerce

financial services

large scale SaaS platforms

capacity forecasting

BEHAVIOURAL

collaborationknowledge sharingcontinuous learninganalytical problem solvingcommunication

Role Details

Seniority mid

Experience 5–10 yrs

Level Lead

Type FULL TIME

AI-Extracted Insights

Domain Areas

payment-platformscore-payment-systemsnational-infrastructure

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Mastercard?

Real rants from real employees. Read before you apply.

Read Company Rants →