Orkes
fintech, e-commerce, logistics, healthcare
SiteReliabilityEngineer
“Site Reliability Engineer at Orkes. Skills: Site Reliability Engineering, Kubernetes, Cloud Platforms (AWS, GCP, Azure), Distributed Systems, Observability, Infrastructure Automation. Own reliability, availability, and performance of production systems running in cloud environments. Define and monitor SLIs/SLOs and help manage error budgets across the platform”
What You'll Achieve.
Own reliability, availability, and performance of production systems; Define and monitor SLIs/SLOs; help manage error budgets; Improve observability; Automate operational workflows and reduce manual toil; improve system resiliency and scalability; Assist with capacity planning, infrastructure optimization, and performance tuning
Industry & Context.
solving tough distributed systems challenges
25% Travel
What They're Looking For.
Must Have
5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related infrastructure roles, experience with cloud platforms such as AWS, GCP, or Azure, Hands-on experience with Kubernetes and containerized environments, understanding of distributed systems and microservices architecture, Experience with observability tools such as Prometheus, Grafana, Datadog, ELK, or OpenTelemetry, Proficiency with infrastructure automation and scripting (Terraform, Python, Bash, etc.), Experience managing CI/CD pipelines and deployment automation, troubleshooting and incident management skills, Ability to work cross-functionally and communicate effectively during high-pressure situations
Nice to Have
Experience supporting large-scale SaaS or cloud-native platforms, Familiarity with workflow orchestration technologies such as Conductor, Temporal, or Camunda, Experience with Kafka, messaging systems, or event-driven architectures, Knowledge of security best practices and cloud infrastructure hardening, Open-source contributions or systems engineering background
What You'll Do.
and performance of production systems running in cloud environments
Define and monitor SLIs/SLOs and help manage error budgets across the platform
Lead incident response efforts including detection
Improve observability through logging
Automate operational workflows and reduce manual toil wherever possible
Assist with capacity planning
infrastructure optimization
and performance tuning
Build internal tooling
and operational best practices
Support Kubernetes-based infrastructure and distributed systems at scale
Act as an escalation point for complex production and platform issues
How You'll Work.
Team & Collaboration
Partner closely with engineering teams to improve system resiliency and scalability; Ability to work cross-functionally and communicate effectively during high-pressure situations; Work alongside a deeply technical and collaborative engineering team
Communication Scope
communicate effectively during high-pressure situations
Applying for this Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Orkes?
Real rants from real employees. Read before you apply.