Orkes

fintech, e-commerce, logistics, healthcare

SiteReliabilityEngineer

$180–250k Sydney, New South Wales, Australia FULL TIME Remote Friendly
The Brief

“Site Reliability Engineer at Orkes. Skills: Site Reliability Engineering, Kubernetes, Cloud Platforms (AWS, GCP, Azure), Distributed Systems, Observability, Infrastructure Automation. Own reliability, availability, and performance of production systems running in cloud environments. Define and monitor SLIs/SLOs and help manage error budgets across the platform”

What You'll Achieve.

Own reliability, availability, and performance of production systems; Define and monitor SLIs/SLOs; help manage error budgets; Improve observability; Automate operational workflows and reduce manual toil; improve system resiliency and scalability; Assist with capacity planning, infrastructure optimization, and performance tuning

Industry & Context.

fintech, e commerce, logistics, healthcare
Problems you'll solve

solving tough distributed systems challenges

Eligibility Requirements

25% Travel

What They're Looking For.

Must Have

5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related infrastructure roles, experience with cloud platforms such as AWS, GCP, or Azure, Hands-on experience with Kubernetes and containerized environments, understanding of distributed systems and microservices architecture, Experience with observability tools such as Prometheus, Grafana, Datadog, ELK, or OpenTelemetry, Proficiency with infrastructure automation and scripting (Terraform, Python, Bash, etc.), Experience managing CI/CD pipelines and deployment automation, troubleshooting and incident management skills, Ability to work cross-functionally and communicate effectively during high-pressure situations

Nice to Have

Experience supporting large-scale SaaS or cloud-native platforms, Familiarity with workflow orchestration technologies such as Conductor, Temporal, or Camunda, Experience with Kafka, messaging systems, or event-driven architectures, Knowledge of security best practices and cloud infrastructure hardening, Open-source contributions or systems engineering background

What You'll Do.

and performance of production systems running in cloud environments

Define and monitor SLIs/SLOs and help manage error budgets across the platform

Lead incident response efforts including detection

Improve observability through logging

Automate operational workflows and reduce manual toil wherever possible

Assist with capacity planning

infrastructure optimization

and performance tuning

Build internal tooling

and operational best practices

Support Kubernetes-based infrastructure and distributed systems at scale

Act as an escalation point for complex production and platform issues

How You'll Work.

Team & Collaboration

Partner closely with engineering teams to improve system resiliency and scalability; Ability to work cross-functionally and communicate effectively during high-pressure situations; Work alongside a deeply technical and collaborative engineering team

Communication Scope

communicate effectively during high-pressure situations

Free ATS check

Applying for this Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Orkes?

Real rants from real employees. Read before you apply.

Read Company Rants →