Tricentis
SaaS
SeniorDirector,CloudandSiteReliabilityEngineering
Neural analysis suggests this role is
optimal for Director candidates.
“Senior Director, Cloud and Site Reliability Engineering at Tricentis. Skills: Cloud infrastructure strategy, Site Reliability Engineering (SRE), AWS, Azure, GCP, Kubernetes, Terraform, Incident management, Observability, Automation. Define and execute the cloud infrastructure roadmap. Establish cloud architecture standards and best practices”
What You'll Achieve.
Ensuring the highest levels of availability, reliability, and performance; Support Tricentis' SaaS platform growth, reliability, and scalability goals; Align cloud spending with business outcomes; Advance platform capabilities; Align cloud and infrastructure initiatives with product roadmap and business goals; Reflect customer expectations and business commitments; Scale the team to meet enhance performance and reliability of our SaaS products; Improve MTTR; Reduce toil and improve system resilience; Ensure systems are observable, scalable, and fault tolerant; Increase consistency and reduce operational risk; Reporting regularly to senior leadership on platform health
Industry & Context.
Solve Problems Together: We win or lose as one team
On-call strategy, Global Sanctions Compliance, Candidates must not be listed on any government restricted party lists (including OFAC SDN List and U. S. Commerce Department restricted lists) and must certify that their employment would not violate any sanctions or export control regulations.
What They're Looking For.
Must Have
10 + years of experience in cloud infrastructure, DevOps, or Site Reliability Engineering, at least 5 years in senior engineering leadership roles, Proven track record leading Cloud or SRE organizations at scale within SaaS or enterprise software companies, Deep expertise in major cloud platforms (AWS, Azure, and/or GCP) including computer, networking, storage, security, and managed services, Strong background in SRE principles, including SLO/SLI/error budget frameworks, observability, chaos engineering, and incident management, Hands-on experience with Kubernetes, Terraform, CI/CD tooling, and modern infrastructure-as-code practices, Experience with compliance frameworks (SOC 2, ISO 27001, FedRAMP, GDPR) and operating in regulated environments, Excellent communication and influencing skills, with the ability to translate complex technical concepts into clear business impact
Nice to Have
AI and Agentic capabilities, multi-cloud, hybrid-cloud, and cloud-native strategies, Kubernetes, Terraform, Pulumi, SOC 2, ISO 27001, ISO 42001, GDPR, FedRAMP
What You'll Do.
Define and execute the cloud infrastructure roadmap
Establish cloud architecture standards and best practices
Drive infrastructure cost optimization and efficiency
Lead the adoption of modern cloud technologies and emerging capabilities (AI and Agentic)
Build and mature the SRE function defining SLOs
Enhance operational effectiveness through the deployment and use of agentic capabilities
Own the incident management and on-call strategy
Champion a culture of reliability embedding SRE principles
Drive automation across infrastructure provisioning
and self-healing systems
Partner with Security to ensure cloud environments meet compliance
Influence infrastructure design earlier in the agentic development process
Oversee infrastructure delivery and operational readiness for all product releases
Drive continuous improvement in CI/CD pipelines
Establish and enforce infrastructure-as-code practices
Define and track key reliability
and availability of metrics
How You'll Work.
Team & Collaboration
Collaborate with peer Engineering and Product leaders; Partner with Finance and Engineering leadership; Work with Engineering teams to influence infrastructure design; Drive continuous improvement in CI/CD pipelines, deployment processes, and DevOps tooling in partnership with product engineering teams
Communication Scope
Excellent communication and influencing skills; Ability to translate complex technical concepts into clear business impact
Process & Methodology
Define and execute the cloud infrastructure roadmap, Oversee infrastructure delivery and operational readiness for all product releases
Full Job Description
We are looking for an experienced and strategic leader to build and scale our Cloud and Site Reliability Engineering (SRE) organization. You will define and drive the cloud infrastructure strategy and operational excellence that underpins Tricentis' SaaS platform, ensuring the highest levels of availability, reliability, and performance. You will lead a team of talented Cloud Engineers and SREs, fostering a culture of excellence, automation-first thinking, and continuous improvement. **What you will do:** **Cloud Strategy & Infrastructure Leadership** * **Define and execute the cloud infrastructure roadmap** to support Tricentis' SaaS platform growth, reliability, and scalability goals across AWS, Azure, and GCP. * **Establish cloud architecture standards and best practices** including multi-cloud, hybrid-cloud, and cloud-native strategies. * **Drive infrastructure cost optimization and efficiency,** partnering with Finance and Engineering leadership to align cloud spending with business outcomes. * **Lead the adoption of modern cloud technologies and emerging capabilities**(AI and Agentic) to advance platform capabilities. * **Collaborate with peer Engineering and Product leaders** to align cloud and infrastructure initiatives with product roadmap and business goals. **Site Reliability Engineering & Operational Excellence** * **Build and mature the SRE function** defining SLOs, SLIs, and error budgets that reflect customer expectations and business commitments. * Enhance operational effectiveness through the deployment and use of agentic capabilities to scale the team to meet enhance performance and reliability of our SaaS products. * **Own the incident management and on-call strategy** to establish effective processes for detection, response, remediation, and post-incident review improving MTTR. * **Champion a culture of reliability** embedding SRE principles across the broader Engineering organization to reduce toil and improve system resilience. Drive automation
Applying for this Senior Director, Cloud and Site Reliability Engineering role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about Tricentis?
Real rants from real employees. Read before you apply.