Arkose Labs

Technology

SiteReliabilityEngineer(IncidentManager)

A$145–195k ~AI est. Fortitude Valley, Queensland, Australia; Australia Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Site Reliability Engineer (Incident Manager) at Arkose Labs. Skills: Site Reliability Engineering, Incident Management, Production Operations. Monitor live production environment. Identify potential issues proactively”

What You'll Achieve.

Reduce toil; Improve resilience; Improve MTTR

Industry & Context.

Technology

Problems you'll solve

Troubleshooting; Diagnosis; Root cause analysis

Eligibility Requirements

Primary on-call, Structured overlap with India/US teams

What They're Looking For.

Must Have

Bachelor's degree in Computer Science or equivalent experience, 3-5 years of experience in production operations, Solid knowledge of Linux/Unix systems, Solid knowledge of networking concepts, Solid knowledge of web technologies, Proficiency in Python scripting, Proficiency in Bash scripting, Demonstrated ability to lead incident response, Experience owning post-mortems, Clear incident reports for customers, Brief non-technical stakeholders in real time, Willingness to serve as primary on-call

Nice to Have

Experience with cloud platforms, Experience with containerization, Familiarity with change management processes, Familiarity with incident management tooling, Experience in fraud prevention domain, Prior experience mentoring junior engineers

What You'll Do.

Monitor live production environment

Identify potential issues proactively

Respond to P1/P2 alerts

Take ownership from detection to resolution

Serve as incident commander

Manage war-room communications

Coordinate cross-functional responders

Manage customer-facing P1 communications

Provide stakeholder updates

Prepare post-incident reports

Own action items to closure

Share learnings with team

Identify runbook gaps

Own release management

Submit change tickets

Coordinate change approval

Contribute to SLO/SLA definition

Report against targets

Develop automation scripts

Maintain automation scripts

Develop monitoring dashboards

Maintain monitoring dashboards

Contribute to platform engineering

Mentor Associate Livesite Engineers

Share institutional context

Act as primary on-call

Act as escalation point

How You'll Work.

Team & Collaboration

Cross-functional responders; Associate Livesite Engineers

Communication Scope

Incident reports; Stakeholder briefings

Process & Methodology

Change management

Full Job Description

Arkose Labs is on a mission to create an online environment where all consumers are protected from spam and abuse. As a Fast Company 2025 Best Workplace for Innovators, we provide a proactive fraud deterrence platform, Arkose Titan, designed to neutralize modern attacks powered by Agentic AI and LLMs. By combining proprietary intelligence with dynamic friction, we undermine attacker ROI to protect global giants like Microsoft, Meta, and Roblox. Headquartered in San Mateo, CA, we maintain a global presence across APAC, Central and South America, and EMEA. About the Role As a Livesite Engineer, you'll own the reliability and operational health of our live production environment. You'll take incidents from detection to resolution, lead post-mortems, manage release changes for your services, and drive platform improvements that reduce toil and improve resilience. You're the primary on-call for your domain and a go-to escalation point for more junior engineers on the team. The role is based in Brisbane and can be fully remote or hybrid. You'll work primarily within AEST business hours, with some structured overlap with our India and US-based teams. What You'll Be Doing Monitor the live production environment to proactively identify potential issues or anomalies before they become incidents. Respond to P1/P2 alerts and outages — take ownership from detection through resolution, not just escalation. Serve as incident commander for the company: manage war-room communications, drive diagnosis, and coordinate cross-functional responders. Manage customer-facing P1 communications — provide clear, timely stakeholder updates and prepare post-incident reports. Lead post-mortems and RCAs for significant incidents; own action items through to closure and share learnings with the team. Own and maintain runbooks for your team's services; proactively identify gaps and close them before the next incident. Own release management for your services — SCOPE change ticket submissions, approv

Free ATS check

Applying for this Site Reliability Engineer (Incident Manager) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Arkose Labs?

Real rants from real employees. Read before you apply.

Read Company Rants →