Veeam Software
Data and AI Trust
StaffSiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Lead candidates.
“Staff Site Reliability Engineer at Veeam Software. Skills: Site Reliability Engineering, Distributed Systems, Public Cloud (Azure preferred), Observability, Infrastructure Automation, Container Orchestration (Kubernetes). serve as a hands-on technical leader within the SRE team. guiding senior engineers”
What You'll Achieve.
ensure the systems we operate are built to be reliable, scalable, and observable from the ground up; scaling SRE principles globally; ensure production readiness
Industry & Context.
What They're Looking For.
Must Have
8+ years of experience in a Software Engineering or SRE role, technical leadership, Demonstrated experience mentoring and guiding senior engineers, Deep expertise in building distributed systems on public cloud (Azure preferred), skills in programming (e. g. , JS, Go, Typescript, Java, or C#), Hands-on experience with observability tooling (e. g. , Prometheus, Grafana, OpenTelemetry), Mastery of infrastructure automation tools (Terraform, Pulumi) and container orchestration (Kubernetes)
Nice to Have
Experience leading SRE initiatives across multiple product teams, Background in chaos engineering, incident learning, or performance and load testing, Familiarity with global compliance standards (ISO, SOC 2, GDPR, FedRAMP, CMMC)
What You'll Do.
serve as a hands-on technical leader within the SRE team
guiding senior engineers
influencing product development teams
ensuring the systems we operate are built to be reliable
and observable from the ground up
drive strategic initiatives
mentor others in the practice of SRE
help define architectural best practices across our platform
enforcing high standards
scaling SRE principles globally
drive adherence across engineering teams
partner with development and product teams to proactively design for failure
build resilient architecture
and operationalize reliability from the start
Drive company-wide adoption of observability best practices and tooling
and traces provide deep
actionable insights across systems
Lead complex incident responses
and systemic reliability improvements
Promote and enforce a blameless culture of learning and continuous improvement
Lead initiatives in infrastructure as code
deployment automation
and resilience testing
Influence the development and adoption of chaos engineering practices and release validation frameworks
Partner with platform and security teams to ensure production readiness
How You'll Work.
Team & Collaboration
Collaborate with Staff peers across teams to align strategy and champion shared reliability standards and goals; Partner with development and product teams; Work closely with your peer Staff Engineers to plan, align, and deliver against reliability goals; Represent the SRE team in technical leadership forums and product planning discussions
Communication Scope
Ability to communicate clearly across geographies and disciplines
Full Job Description
Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands. About the Role We are looking for a Staff Site Reliability Engineer, you will serve as a hands-on technical leader within the SRE team, guiding senior engineers, influencing product development teams, and ensuring the systems we operate are built to be reliable, scalable, and observable from the ground up. You will drive strategic initiatives, mentor others in the practice of SRE, and help define architectural best practices across our platform. This role is pivotal in aligning teams, enforcing high standards, and scaling SRE principles globally within Veeam. What You’ll Do Reliability Engineering drive adherence across engineering teams Collaborate with Staff peers across teams to align strategy and champion shared reliability standards and goals Partner with development and product teams to proactively design for failure, build resilient architecture, and operationalize reliability from the start Observability & Operational Excellence: Drive company-wide adoption of observability best practices and tooling Ensure metrics, logs, and traces provide deep, actionable insights across systems Lead complex incident responses, postmortems, and systemic reliability improvements Promote and enforce a blameless culture of learning and continuous improvement Engineering at Scale: Lead initiatives in infrastructure as code, deployment auto
Applying for this Staff Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Veeam Software?
Real rants from real employees. Read before you apply.