Rakuten Asia Pte. Ltd.

internet services

SeniorSiteReliability(DevOps)Engineer

Singapore, Singapore FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Site Reliability (DevOps) Engineer at Rakuten Asia Pte. Ltd.. Skills: Site Reliability Engineering, DevOps, Cloud Platforms (GCP, AWS, Azure), Kubernetes, Terraform, Observability (Prometheus, Grafana, Datadog, ELK), CI/CD, Automation, Python/Go/Java, Incident Management, People Management. Define and drive SRE strategy, including SLO/SLI frameworks, error budgets, and reliability targets aligned with business objectives and customer expectations. Establish and improve incident management”

What You'll Achieve.

minimize MTTR; prevent recurring issues; reduce toil; improve deployment reliability; enable self-service capabilities; support scalability, fault tolerance, and cost optimization goals; Report on reliability metrics, incident trends, and operational health to leadership, translating technical insights into business impact assessments

Industry & Context.

internet services
Eligibility Requirements

on-call rotations

What They're Looking For.

Must Have

8+ years of experience in software engineering, DevOps, or site reliability engineering, at least 3 years in a people management role, Proven track record of building and leading high-performing SRE or platform engineering teams in a distributed, multi-timezone environment, Deep expertise in cloud platforms (GCP preferred, AWS/Azure acceptable) including compute, networking, storage, and managed services, knowledge of containerization and orchestration technologies (Kubernetes, Docker), Infrastructure as Code (Terraform, Ansible), Hands-on experience with observability tools and practices (Prometheus, Grafana, Datadog, ELK Stack, or similar), defining meaningful SLOs/SLIs, Experience with CI/CD pipelines, deployment strategies (blue-green, canary), and release engineering best practices, programming/scripting skills in languages such as Python, Go, or Java for automation and tooling development, Excellent communication skills with the ability to collaborate effectively across engineering, product, and business stakeholders, incident management experience with demonstrated ability to lead high-pressure situations calmly and effectively

Nice to Have

Experience with big data technologies (Hadoop, Spark, Kafka) and data pipeline reliability, Familiarity with marketing technology platforms, email delivery systems, or customer data platforms, Knowledge of database administration and optimization (PostgreSQL, MySQL, Redis, Couchbase), Experience with chaos engineering practices and tools (Chaos Monkey, Litmus, Gremlin), Certifications such as Google Cloud Professional Cloud Architect, AWS Solutions Architect, or Kubernetes Administrator (CKA), Japanese language proficiency is a plus for collaboration with Japan-based teams

What You'll Do.

Define and drive SRE strategy

including SLO/SLI frameworks

and reliability targets aligned with business objectives and customer expectations

Establish and improve incident management processes

including on-call rotations

escalation procedures

and blameless post-mortem practices to minimize MTTR and prevent recurring issues

Collaborate with development teams to embed reliability practices into the software development lifecycle

advocating for design reviews

and production readiness reviews

Design and implement comprehensive observability solutions (monitoring

alerting) to provide actionable insights into system health and performance

Drive automation initiatives to reduce toil

improve deployment reliability

and enable self-service capabilities for engineering teams

Partner with Architecture and Platform teams to ensure infrastructure decisions support scalability

and cost optimization goals

Manage capacity planning and performance optimization for critical marketing platforms handling high-volume campaign executions and real-time personalization

Report on reliability metrics

and operational health to leadership

translating technical insights into business impact assessments

How You'll Work.

Team & Collaboration

Collaborate with development teams to embed reliability practices into the software development lifecycle; Partner with Architecture and Platform teams to ensure infrastructure decisions support scalability, fault tolerance, and cost optimization goals; Excellent communication skills with the ability to collaborate effectively across engineering, product, and business stakeholders; collaboration with Japan-based teams

Communication Scope

Excellent communication skills with the ability to collaborate effectively across engineering, product, and business stakeholders

Full Job Description

**Job Description:** Situated in the heart of Singapore's Central Business District, Rakuten Asia Pte. Ltd. is Rakuten's Asia Regional headquarters. Established in August 2012 as part of Rakuten's global expansion strategy, Rakuten Asia comprises various businesses that provide essential value-added services to Rakuten's global ecosystem. Through advertisement product development, product strategy, and data management, among others, Rakuten Asia is strengthening Rakuten Group's core competencies to take the lead in an increasingly digitalized world. Rakuten Group, Inc. is a global leader in internet services that empower individuals, communities, businesses, and society. Founded in Tokyo in 1997 as an online marketplace, Rakuten has expanded to offer services in e-commerce, fintech, digital content, and communications to approximately 1.7 billion members around the world. The Rakuten Group has nearly 32,000 employees and operations in 30 countries and regions. For more information visit The Marketing Cloud Platform Department (MCPD) drives Rakuten's marketing product strategy, executes product development, and ensures successful implementation. We empower Rakuten's internal marketing teams by creating engaging, respectful, and cost-efficient marketing platforms that prioritize our customers. Leveraging the Rakuten Ecosystem, we offer comprehensive marketing solutions, including campaign management, multichannel communication, and personalization. As a team of over 150 experts across Japan, India, and Singapore, we pride ourselves on being a technology-driven organization that shares knowledge within the Rakuten Tech community. As an **Senior Site Reliability (DevOps) Engineer** in MCPD, you will drive operational excellence by implementing best practices in observability, incident management, and automation. This role bridges engineering and operations, requiring both strong technical expertise and people management skills to build and maintain highly available syst

Free ATS check

Applying for this Senior Site Reliability (DevOps) Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Rakuten Asia Pte. Ltd.?

Real rants from real employees. Read before you apply.

Read Company Rants →