Stuut

Financial Services

LeadSiteReliabilityEngineer

$200–275k San Francisco, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Lead Site Reliability Engineer at Stuut. Skills: Site Reliability Engineering, Cloud Infrastructure, Distributed Systems. Set reliability strategy. Define long-term vision”

Industry & Context.

Financial Services
Problems you'll solve

Root cause analysis; Deep debugging

What They're Looking For.

Must Have

7+ years SRE/infra/backend experience, Designed/operated highly available systems, Fluent in Python/TypeScript, Deep experience with AWS, Deep experience with Kubernetes (EKS), Deep experience with Docker, Deep experience with cloud-native architectures, Implemented/evolved observability stacks, Know how to create high-signal alerting, Understand SLOs/SLIs/error budgets, Supported systems with FastAPI, Supported systems with Vue.js, Supported systems with PostgreSQL (RDS), Supported event-driven architectures, Improved reliability with CI/CD, Improved reliability with IaC, Improved reliability with modern deployment workflows

Nice to Have

Kubernetes experience a plus

What You'll Do.

Set reliability strategy

Define long-term vision

Define availability targets

Define operational standards

Architect resilient cloud infrastructure

Maintain resilient cloud infrastructure

Architect scalable cloud infrastructure

Maintain scalable cloud infrastructure

Ensure systems are secure

Ensure systems are fault-tolerant

Ensure systems are cost-effective

Design monitoring systems

Evolve monitoring systems

Design alerting systems

Evolve alerting systems

Design logging systems

Evolve logging systems

Own incident management practices

Lead major incident response

Drive blameless postmortems

Identify reliability risks

Lead redundancy efforts

Lead failover efforts

Lead capacity planning efforts

Lead graceful degradation efforts

Ensure deployments are safe

Ensure deployments are observable

Improve rollout strategies

Reduce operational risk

Partner with engineering teams

Influence system design

Influence scalability tradeoffs

Influence reliability tradeoffs

Automate operational tasks

Build tooling to reduce toil

Accelerate safe execution

Guide teams through debugging

Ensure fixes address root causes

Promote reliability-first thinking

Promote operational hygiene

Promote shared ownership

Coach engineers on reliability principles

Coach engineers on incident handling

Coach engineers on infrastructure design

Coach engineers on operational best practices

How You'll Work.

Team & Collaboration

Product teams; Engineering teams; Backend teams; Frontend teams; Infrastructure teams

Full Job Description

Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly. Our platform is gaining traction with finance teams across industrials, chemicals, and manufacturing sectors from Fortune 10 brands to scaling midmarkets. We're backed by top-tier investors including a16z, Khosla, Activant, 1984 Ventures and Page One. The Role We’re hiring a Lead Site Reliability Engineer to drive the strategy, architecture, and execution of reliability, scalability, and operational excellence across our platform. You’ll build and scale the systems that keep Stuut highly available, performant, and resilient as we grow customers, traffic, and complexity. From defining SLOs and reliability standards to hardening infrastructure, improving observability, and guiding teams through incident response and postmortems, you’ll own the engineering rigor that allows us to ship quickly without sacrificing stability. You’ll turn strong reliability engineering into real customer trust — creating the guardrails that let product and engineering move fast with confidence. This is a hands-on technical leadership role for an engineer who excels at designing reliable distributed systems, influencing engineering practices, and leading high-impact reliability initiatives across teams. What You’ll Do - Set the Reliability Strategy: define the long-term vision for site reliability, including SLOs/SLIs, error budgets, availability targets, and operational standards. - Build & Scale Reliable Infrastructure: architect and maintain resilient, scalable cloud infrastructure across AWS and Kubernetes, ensuring systems are secure, fault-tolerant, and cost-effective. - Own Observability & Monitoring: design and evolve monitoring, alerting, and logging systems that provide clear, actionable signals across services and environments. - Lead Incident Response & Postmortems: own incident mana

Free ATS check

Applying for this Lead Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about Stuut?

Real rants from real employees. Read before you apply.

Read Company Rants →