Company

Technology

SeniorSoftwareEngineer,SiteReliabilityEngineering

United States; Canada FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Software Engineer, Site Reliability Engineering. Skills: Site Reliability Engineering, AWS, Python, distributed systems, microservices. Design, build, and maintain scalable and highly available infrastructure and systems that support large-scale distributed applications. Define and influence architectural direction for platform services, ensuring resilience, performance, and scalability across systems. Develop tools and automation for deployment, monitoring, configuration management, and ”

What You'll Achieve.

ensuring resilience, performance, and scalability across systems. ensuring minimal downtime and rapid recovery. enhance system visibility and reliability. proactively address scaling challenges. improve developer experience and reduce operational toil.

Industry & Context.

Technology
Problems you'll solve

problem-solving skills with the ability to break down complex system challenges and evaluate technical trade-offs

What They're Looking For.

Must Have

5+ years of experience in Site Reliability Engineering, infrastructure engineering, or distributed systems roles. expertise in AWS and Linux-based environments. Proficiency in programming languages such as Python, Go, JavaScript, or similar for automation and system development. Deep understanding of distributed systems and networking protocols including DNS, HTTP/S, TLS, and TCP/IP. Hands-on experience operating, monitoring, and debugging large-scale microservices architectures in production environments. problem-solving skills with the ability to break down complex system challenges and evaluate technical trade-offs. Excellent communication skills with the ability to collaborate across engineering and non-engineering stakeholders. focus on system reliability, scalability, and reducing operational overhead.

What You'll Do.

Design, build, and maintain scalable and highly available infrastructure and systems that support large-scale distributed applications.

Define and influence architectural direction for platform services, ensuring resilience, performance, and scalability across systems.

Develop tools and automation for deployment, monitoring, configuration management, and infrastructure operations.

Troubleshoot and resolve complex production issues across distributed systems, ensuring minimal downtime and rapid recovery.

Improve observability, monitoring, and alerting systems to enhance system visibility and reliability.

Participate in capacity planning, performance tuning, and forecasting to proactively address scaling challenges.

Collaborate with engineering teams to improve developer experience and reduce operational toil through automation and platform improvements.

Participate in on-call rotations and provide incident response support for critical systems.

How You'll Work.

Team & Collaboration

Collaborate with engineering teams to improve developer experience and reduce operational toil through automation and platform improvements.

Communication Scope

Excellent communication skills with the ability to collaborate across engineering and non-engineering stakeholders

Full Job Description

## Accountabilities Design, build, and maintain scalable and highly available infrastructure and systems that support large-scale distributed applications. Define and influence architectural direction for platform services, ensuring resilience, performance, and scalability across systems. Develop tools and automation for deployment, monitoring, configuration management, and infrastructure operations. Troubleshoot and resolve complex production issues across distributed systems, ensuring minimal downtime and rapid recovery. Improve observability, monitoring, and alerting systems to enhance system visibility and reliability. Participate in capacity planning, performance tuning, and forecasting to proactively address scaling challenges. Collaborate with engineering teams to improve developer experience and reduce operational toil through automation and platform improvements. Participate in on-call rotations and provide incident response support for critical systems. Requirements: 5+ years of experience in Site Reliability Engineering, infrastructure engineering, or distributed systems roles. Strong expertise in AWS and Linux-based environments. Proficiency in programming languages such as Python, Go, JavaScript, or similar for automation and system development. Deep understanding of distributed systems and networking protocols including DNS, HTTP/S, TLS, and TCP/IP. Hands-on experience operating, monitoring, and debugging large-scale microservices architectures in production environments. Strong problem-solving skills with the ability to break down complex system challenges and evaluate technical trade-offs. Excellent communication skills with the ability to collaborate across engineering and non-engineering stakeholders. Strong focus on system reliability, scalability, and reducing operational overhead. Benefits: Competitive base salary range aligned with experience and location Equity participation in a high-growth technology organization Comprehensive medical, denta

Free ATS check

Applying for this Senior Software Engineer, Site Reliability Engineering role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Lever

  • Lever uses a streamlined one-page form — apply in under 5 minutes.
  • LinkedIn import works well; review parsed data before submitting.
  • The cover letter field is optional but visible to reviewers — use it to differentiate.
  • Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →