Bybit

FinTech

SRELeader

$360–600k ~AI est. Kuala Lumpur, Malaysia
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“SRE Leader at Bybit. Skills: Site Reliability Engineering, Infrastructure automation, Cloud operations, Cost optimization. Construct reliability engineering system. Establish SLO/SLA system”

What You'll Achieve.

Alarm accuracy > 95%; Deployment hourly; Reduce toil < 30%

Industry & Context.

FinTech
Problems you'll solve

Problem-solvers; Engineering methods

Eligibility Requirements

On-call system

What They're Looking For.

Must Have

More than 10 years experience, More than 5 years team leadership, Deep understanding of SRE methodology, Large-scale cost management experience, Systematic FinOps experience, Capacity modeling capability, Automated operation and maintenance practice, Successful toil reduction cases, Proficient in IaC tools, Experience in writing systems

Nice to Have

SRE management experience crypto exchanges, SRE management experience traditional securities, SRE management experience payment companies, Kubernetes large-scale cluster experience, High availability architecture experience, Experience building internal cost platforms, Experience building FinOps tools, Practical chaos engineering experience, Infrastructure preparation for compliance audits

What You'll Do.

Construct reliability engineering system

Establish SLO/SLA system

Define reliability indicators

Drive change based on Error Budget

Construct MTTD/MTTR measurement system

Optimize on-call system

Establish Runbook automated execution

Measure on-call quality

Deploy financial cloud isolation

Design network isolation architecture

Manage security groups

Implement Zero Trust Network architecture

Build compliance station infrastructure

Standardize compliance station templates

Automate inter-site isolation verification

Abstract cloud operation and maintenance

Design cross-regional disaster recovery

Guarantee data sovereignty

Guarantee wallet/transaction core chain

Operate hot and cold wallet isolation

Achieve transaction zero downtime change

Perform multiactive/disaster recovery switching

Push team to SRE transformation

Establish SRE competency model

Establish knowledge sedimentation mechanisms

Eliminate single-point personnel risk

Cultivate senior SREs

How You'll Work.

Team & Collaboration

Global team collaboration; Cross-functional teams

Process & Methodology

Capacity Planning, Incident Management

Full Job Description

About Us Established in 2018, Bybit is one of the world’s leading cryptocurrency exchanges and digital financial platforms, serving over 80 million users across more than 200 countries and regions. Powered by world-class technology and a user-first mindset, Bybit delivers a seamless ecosystem across trading, payments, wealth management, custody, institutional services, and Web3 — connecting users to the future of digital finance. Our core values define how we build. We listen, care and improve to create products and experiences that put users first. Backed by a global team of ambitious builders, problem-solvers, and innovators, we foster a high-performance and fast-moving environment where talent is empowered to drive real impact at the global scale. Supported by 24/7 multilingual customer service and a strong commitment to innovation, we are shaping the future of finance through technology, collaboration, and bold execution. Today, Bybit is recognized as one of the most trusted and transparent platforms in the digital asset industry, continuing to expand its global presence while building the infrastructure for the next generation of financial services. Core responsibilities Construction of reliability engineering system Establish a company-wide SLO/SLA system: Define quantifiable reliability indicators (availability, latency, error rate) for each Line of Business, and drive change rhythm and investment decisions based on Error Budget Construct MTTD/MTTR measurement system, set grading goals and continuously optimize: P-1 target MTTD 60% On-call system optimization: Alarm accuracy > 95% (eliminating alarm fatigue) Establish Runbook automated execution capability On-call quality measurement and continuous improvement Financial cloud isolation and multi-compliance station deployment (key) Financial-grade network isolation architecture design and operation and maintenance: Design and implementation of network isolation strategies for multiple accounts, multiple VPCs,

Free ATS check

Applying for this SRE Leader role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Bybit?

Real rants from real employees. Read before you apply.

Read Company Rants →