Stuut
Financial Services
LeadSiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Lead candidates.
“Lead Site Reliability Engineer at Stuut. Skills: Site Reliability Engineering, Cloud Infrastructure, Distributed Systems. Set reliability strategy. Define long-term vision”
Industry & Context.
Root cause analysis; Deep debugging
What They're Looking For.
Must Have
7+ years SRE/infra/backend experience, Designed/operated highly available systems, Fluent in Python/TypeScript, Deep experience with AWS, Deep experience with Kubernetes (EKS), Deep experience with Docker, Deep experience with cloud-native architectures, Implemented/evolved observability stacks, Know how to create high-signal alerting, Understand SLOs/SLIs/error budgets, Supported systems with FastAPI, Supported systems with Vue.js, Supported systems with PostgreSQL (RDS), Supported event-driven architectures, Improved reliability with CI/CD, Improved reliability with IaC, Improved reliability with modern deployment workflows
Nice to Have
Kubernetes experience a plus
What You'll Do.
Set reliability strategy
Define long-term vision
Define availability targets
Define operational standards
Architect resilient cloud infrastructure
Maintain resilient cloud infrastructure
Architect scalable cloud infrastructure
Maintain scalable cloud infrastructure
Ensure systems are secure
Ensure systems are fault-tolerant
Ensure systems are cost-effective
Design monitoring systems
Evolve monitoring systems
Design alerting systems
Evolve alerting systems
Design logging systems
Evolve logging systems
Own incident management practices
Lead major incident response
Drive blameless postmortems
Identify reliability risks
Lead redundancy efforts
Lead failover efforts
Lead capacity planning efforts
Lead graceful degradation efforts
Ensure deployments are safe
Ensure deployments are observable
Improve rollout strategies
Reduce operational risk
Partner with engineering teams
Influence system design
Influence scalability tradeoffs
Influence reliability tradeoffs
Automate operational tasks
Build tooling to reduce toil
Accelerate safe execution
Guide teams through debugging
Ensure fixes address root causes
Promote reliability-first thinking
Promote operational hygiene
Promote shared ownership
Coach engineers on reliability principles
Coach engineers on incident handling
Coach engineers on infrastructure design
Coach engineers on operational best practices
How You'll Work.
Team & Collaboration
Product teams; Engineering teams; Backend teams; Frontend teams; Infrastructure teams
Full Job Description
Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and costly. Our platform is gaining traction with finance teams across industrials, chemicals, and manufacturing sectors from Fortune 10 brands to scaling midmarkets. We're backed by top-tier investors including a16z, Khosla, Activant, 1984 Ventures and Page One. The Role We’re hiring a Lead Site Reliability Engineer to drive the strategy, architecture, and execution of reliability, scalability, and operational excellence across our platform. You’ll build and scale the systems that keep Stuut highly available, performant, and resilient as we grow customers, traffic, and complexity. From defining SLOs and reliability standards to hardening infrastructure, improving observability, and guiding teams through incident response and postmortems, you’ll own the engineering rigor that allows us to ship quickly without sacrificing stability. You’ll turn strong reliability engineering into real customer trust — creating the guardrails that let product and engineering move fast with confidence. This is a hands-on technical leadership role for an engineer who excels at designing reliable distributed systems, influencing engineering practices, and leading high-impact reliability initiatives across teams. What You’ll Do - Set the Reliability Strategy: define the long-term vision for site reliability, including SLOs/SLIs, error budgets, availability targets, and operational standards. - Build & Scale Reliable Infrastructure: architect and maintain resilient, scalable cloud infrastructure across AWS and Kubernetes, ensuring systems are secure, fault-tolerant, and cost-effective. - Own Observability & Monitoring: design and evolve monitoring, alerting, and logging systems that provide clear, actionable signals across services and environments. - Lead Incident Response & Postmortems: own incident mana
Applying for this Lead Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about Stuut?
Real rants from real employees. Read before you apply.