Emburse

SaaS

SiteReliabilityEngineerIII(SREIII)

Toronto, Ontario, Canada FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Site Reliability Engineer III (SRE III) at Emburse. Skills: Site Reliability Engineering, Cloud Infrastructure, Automation, Kubernetes. Identify, evaluate, and implement preventative measures. Ensure services are designed for availability”

What You'll Achieve.

Ensure systems are highly available; Ensure systems are scalable; Ensure systems are performant; Drive operational excellence; Reduce customer impact; Improve site latency; Improve performance; Improve uptime; Support operational efficiency; Enable developer productivity; Remove cross functional dependencies; Ensure reliability requirements are met

Industry & Context.

SaaS
Problems you'll solve

Analytical thinker; Root-cause problem solving

What They're Looking For.

Must Have

Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation, Proficiency in Linux-based distributed environments, Deep experience with cloud platforms (AWS or Azure), Infrastructure-as-Code (Terraform), Excellent scripting skills (Python, Bash, Powershell), Object-oriented programming experience, Demonstrated ability to develop and maintain internal tools and automation solutions, Excellent written and verbal communication skills in English, Project management and organizational abilities, Experience collaborating with offshore or globally distributed teams, Expertise in containerization and orchestration technologies (Docker, Kubernetes), Experience with Kubernetes scaling tooling (Karpenter, KEDA), Understanding of DevOps principles, Understanding of modern CI/CD pipelines, Experience with observability stacks (Prometheus, Grafana, OpenTelemetry), Familiarity with self-healing systems, Familiarity with site reliability best practices, Background in SaaS environments or large-scale distributed applications, Analytical thinker with a focus on root-cause problem solving, Self-starter with a ownership mentality and accountability, Mentor and collaborator who uplifts teams and promotes learning culture, Committed to operational excellence and continuous improvement

Nice to Have

Certified Kubernetes Administrator (CKA), AWS Certification

What You'll Do.

and implement preventative measures

Ensure services are designed for availability

and provide visibility

and automate cloud infrastructure

Apply Infrastructure-as-Code principles

Write and maintain scripts

and automation frameworks

Partner with leadership on solutions

Collaborate with Platform Engineering teams

Align operational goals with roadmaps

Define non-functional requirements

Lead cross-functional troubleshooting

Serve as a technical mentor

Lead root cause analysis

Support offshore and distributed teams

Participate in design and architecture reviews

How You'll Work.

Team & Collaboration

Collaborate with Platform Engineering teams; Partner with engineering leadership; Collaborate with offshore or globally distributed teams; Uplift teams and promote learning culture

Communication Scope

Excellent written and verbal communication skills in English

Process & Methodology

Project management, Organizational abilities, Backlog grooming, Planning processes

Full Job Description

## Description Who We Are: At Emburse, you’ll not just imagine the future – you’ll build it. As a leader in travel and expense solutions, we are creating a future where technology drives business value and inspires extraordinary results. Our AI-powered platform helps organizations modernize financial operations, increase visibility, and optimize spend across the enterprise. The Site Reliability Engineer III (SRE III) plays a critical role in ensuring Emburse’s systems are highly available, scalable, and performant. This role blends deep technical expertise with strong collaboration and leadership skills to drive operational excellence across distributed systems. The ideal candidate is passionate about automation, cloud infrastructure, observability, and continuous improvement, while mentoring junior engineers and driving reliability culture across the organization ## What you will do Service Reliability & Performance Proactively identify, evaluate, and implement preventative measures to reduce customer impact. Ensure all services are designed and operated with 24/7 availability, scalability, and resilience in mind. Monitor, troubleshoot, and provide visibility to improve site latency, performance, and uptime. Engineering Excellence & Automation Design, develop, and automate reliable cloud infrastructure and platform services. Apply Infrastructure-as-Code (IaC) principles to manage large-scale distributed systems. Write and maintain scripts, tools, and automation frameworks to support operational efficiency. Partner with engineering leadership to develop solutions enabling developer productivity and remove cross functional dependencies. Collaboration & Process Development Collaborate with Platform Engineering teams on project definitions, requirements, backlog grooming, and planning processes. Align operational goals with product and engineering roadmaps to ensure reliability requirements are met early in the lifecycle. Define non-functional requirements (NFRs) and i

Free ATS check

Applying for this Site Reliability Engineer III (SRE III) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Lever

  • Lever uses a streamlined one-page form — apply in under 5 minutes.
  • LinkedIn import works well; review parsed data before submitting.
  • The cover letter field is optional but visible to reviewers — use it to differentiate.
  • Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about Emburse?

Real rants from real employees. Read before you apply.

Read Company Rants →