8x8
Unified Communications
SiteReliabilityEngineer,UCOperations
Neural analysis suggests this role is
optimal for Mid candidates.
“Site Reliability Engineer, UC Operations at 8x8. Skills: Site Reliability Engineering, Platform Operations, Infrastructure Engineering, Linux Systems Administration, Cloud Provider Experience, Scripting (Python/Bash), Incident Response, SRE Concepts, Automation. Production Operations & Incident Response. Own platform reliability across global UC infrastructure”
Industry & Context.
Triage and resolve complex issues — service restarts, hung processes, infrastructure failures; structured thinking; diagnostics
On-Call & Coverage, Shared on-call rotation, approximately 1 week per month, Escalation is always an option and is you are expected to drive the response and know when to pull others in, not to hero it alone.
What They're Looking For.
Must Have
3+ years in a site reliability, platform operations, or infrastructure engineering role, Solid Linux systems administration: multi-service distributed systems, log reading, systemctl, network diagnostics, no GUI required, Hands-on experience with at least one major cloud provider (OCI, AWS, GCP, or Azure) — compute, storage, IAM, networking fundamentals, On-call experience: calm under pressure, fast triage, clear communication during an incident, Scripting in Python or Bash — enough to automate a task, parse logs, or hit an API independently, incident response discipline: structured thinking, stakeholder communication, post-mortems that actually say something, Familiarity with SRE concepts: SLIs, SLOs, error budgets, toil measurement, AI-forward mindset — you use AI tools as a core part of how you work, not as a novelty
Nice to Have
Experience with Oracle Cloud Infrastructure (OCI) — compute, networking, Log Analytics, Object Storage, Familiarity with VoIP and SIP infrastructure — registration, trunking, call this is a UC platform and that knowledge matters, Knowledge of observability tooling: Prometheus, Grafana, PagerDuty, OCI Log Analytics, Experience with Ansible for configuration management and deployment automation, Exposure to infrastructure migrations at scale in multi-tenant SaaS environments
What You'll Do.
Production Operations & Incident Response
Own platform reliability across global UC infrastructure
driving incident response rather than just resolving in isolation
Triage and resolve complex issues — service restarts
infrastructure failures
act as an escalation for the NOC when frontline teams hit their limit
Execute the unglamorous but essential work: scheduled maintenance
Lead blameless post-mortems that produce real follow-through
Identify recurring manual work and build automation to eliminate it
Participate in 2-week sprint cycles to deliver automation
and infrastructure initiatives from a structured backlog
Address security issues as they arise — CVEs
Define and track SLIs
and SLAs to drive honest
data-driven conversations about where reliability investment is needed
Build and maintain dashboards (Grafana
Leverage AI-powered tooling to accelerate diagnostics and reduce cognitive load at scale
Shared on-call rotation
approximately 1 week per month
How You'll Work.
Team & Collaboration
Work directly with Support, Sales, Sales Engineering, NOC, Professional Services, and Engineering teams across 8x8; Translate production events into clear, business-readable communication; Feed operational insight back into engineering — turning recurring failures and patterns into actionable bug reports and platform improvements; working alongside Support, Sales, and Professional Services; coordinating with Engineering
Communication Scope
clear communication during an incident; Translate production events into clear, business-readable communication; stakeholder communication
Process & Methodology
2-week sprint cycles, structured backlog
Full Job Description
8x8 connects our customers and teams globally, empowering CX leaders with performance and insights to make smarter decisions, delight customers, and drive lasting business impact. ****About 8x8 UC Operations**** The UC Operations team manages the production infrastructure behind 8x8's Unified Communications platform — voice, fax, messaging, and collaboration services used by enterprise customers globally. The team oversees dozens of applications running across more than two thousand service instances worldwide, spanning VoIP infrastructure, messaging brokers, storage systems, and cloud workloads across Oracle Cloud Infrastructure and physical datacenters. UC Ops sits at the operational center of 8x8 — taking escalations from the NOC, coordinating with Engineering, and working alongside Support, Sales, and Professional Services. The work is complex, the systems are live, and the stakes are real. We are actively moving from reactive operations to a proactive, automation-first SRE model — and we are looking for engineers who want to help build that, not just maintain the status quo. **What You 'll Do** * Production Operations & Incident Response * Own platform reliability across global UC infrastructure, driving incident response rather than just resolving in isolation. * Triage and resolve complex issues — service restarts, hung processes, infrastructure failures — and act as an escalation for the NOC when frontline teams hit their limit. * Execute the unglamorous but essential work: scheduled maintenance, certificate renewals, log rotation — the stuff that prevents failure before it happens. * Lead blameless post-mortems that produce real follow-through, not action items that disappear into a backlog. **Cross-Team Collaboration** * Work directly with Support, Sales, Sales Engineering, NOC, Professional Services, and Engineering teams across 8x8 — this team sits at the operational center of the company. * Translate production events into clear, business-readable commu
Applying for this Site Reliability Engineer, UC Operations role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about 8x8?
Real rants from real employees. Read before you apply.