8x8

Unified Communications

SiteReliabilityEngineer,UCOperations

Manila, Philippines Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Site Reliability Engineer, UC Operations at 8x8. Skills: Site Reliability Engineering, Platform Operations, Infrastructure Engineering, Linux Systems Administration, Cloud Provider Experience, Scripting (Python/Bash), Incident Response, SRE Concepts, Automation. Production Operations & Incident Response. Own platform reliability across global UC infrastructure”

Industry & Context.

Unified Communications

Problems you'll solve

Triage and resolve complex issues — service restarts, hung processes, infrastructure failures; structured thinking; diagnostics

Eligibility Requirements

On-Call & Coverage, Shared on-call rotation, approximately 1 week per month, Escalation is always an option and is you are expected to drive the response and know when to pull others in, not to hero it alone.

What They're Looking For.

Must Have

3+ years in a site reliability, platform operations, or infrastructure engineering role, Solid Linux systems administration: multi-service distributed systems, log reading, systemctl, network diagnostics, no GUI required, Hands-on experience with at least one major cloud provider (OCI, AWS, GCP, or Azure) — compute, storage, IAM, networking fundamentals, On-call experience: calm under pressure, fast triage, clear communication during an incident, Scripting in Python or Bash — enough to automate a task, parse logs, or hit an API independently, incident response discipline: structured thinking, stakeholder communication, post-mortems that actually say something, Familiarity with SRE concepts: SLIs, SLOs, error budgets, toil measurement, AI-forward mindset — you use AI tools as a core part of how you work, not as a novelty

Nice to Have

Experience with Oracle Cloud Infrastructure (OCI) — compute, networking, Log Analytics, Object Storage, Familiarity with VoIP and SIP infrastructure — registration, trunking, call this is a UC platform and that knowledge matters, Knowledge of observability tooling: Prometheus, Grafana, PagerDuty, OCI Log Analytics, Experience with Ansible for configuration management and deployment automation, Exposure to infrastructure migrations at scale in multi-tenant SaaS environments

What You'll Do.

Production Operations & Incident Response

Own platform reliability across global UC infrastructure

driving incident response rather than just resolving in isolation

Triage and resolve complex issues — service restarts

infrastructure failures

act as an escalation for the NOC when frontline teams hit their limit

Execute the unglamorous but essential work: scheduled maintenance

Lead blameless post-mortems that produce real follow-through

Identify recurring manual work and build automation to eliminate it

Participate in 2-week sprint cycles to deliver automation

and infrastructure initiatives from a structured backlog

Address security issues as they arise — CVEs

Define and track SLIs

and SLAs to drive honest

data-driven conversations about where reliability investment is needed

Build and maintain dashboards (Grafana

Leverage AI-powered tooling to accelerate diagnostics and reduce cognitive load at scale

Shared on-call rotation

approximately 1 week per month

How You'll Work.

Team & Collaboration

Work directly with Support, Sales, Sales Engineering, NOC, Professional Services, and Engineering teams across 8x8; Translate production events into clear, business-readable communication; Feed operational insight back into engineering — turning recurring failures and patterns into actionable bug reports and platform improvements; working alongside Support, Sales, and Professional Services; coordinating with Engineering

Communication Scope

clear communication during an incident; Translate production events into clear, business-readable communication; stakeholder communication

Process & Methodology

2-week sprint cycles, structured backlog

Full Job Description

8x8 connects our customers and teams globally, empowering CX leaders with performance and insights to make smarter decisions, delight customers, and drive lasting business impact. ****About 8x8 UC Operations**** The UC Operations team manages the production infrastructure behind 8x8's Unified Communications platform — voice, fax, messaging, and collaboration services used by enterprise customers globally. The team oversees dozens of applications running across more than two thousand service instances worldwide, spanning VoIP infrastructure, messaging brokers, storage systems, and cloud workloads across Oracle Cloud Infrastructure and physical datacenters. UC Ops sits at the operational center of 8x8 — taking escalations from the NOC, coordinating with Engineering, and working alongside Support, Sales, and Professional Services. The work is complex, the systems are live, and the stakes are real. We are actively moving from reactive operations to a proactive, automation-first SRE model — and we are looking for engineers who want to help build that, not just maintain the status quo. **What You 'll Do** * Production Operations & Incident Response * Own platform reliability across global UC infrastructure, driving incident response rather than just resolving in isolation. * Triage and resolve complex issues — service restarts, hung processes, infrastructure failures — and act as an escalation for the NOC when frontline teams hit their limit. * Execute the unglamorous but essential work: scheduled maintenance, certificate renewals, log rotation — the stuff that prevents failure before it happens. * Lead blameless post-mortems that produce real follow-through, not action items that disappear into a backlog. **Cross-Team Collaboration** * Work directly with Support, Sales, Sales Engineering, NOC, Professional Services, and Engineering teams across 8x8 — this team sits at the operational center of the company. * Translate production events into clear, business-readable commu

Free ATS check

Applying for this Site Reliability Engineer, UC Operations role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 86 detected · ranked by frequency

Linux Systems Administration ×5

Incident Response ×5

SRE Concepts ×5

Automation ×3

multi-service distributed systems ×3

log reading ×3

systemctl ×3

network diagnostics ×3

compute ×3

storage ×3

IAM ×3

networking fundamentals ×3

scripting ×3

automate a task ×3

parse logs ×3

hit an API ×3

error budgets ×3

toil measurement ×3

AI tools ×3

configuration management ×3

deployment automation ×3

infrastructure migrations ×3

multi-tenant SaaS environments ×3

VoIP infrastructure ×3

SIP infrastructure ×3

registration ×3

trunking ×3

call ×3

observability tooling ×3

post-mortems ×3

Site Reliability Engineering ×2

Platform Operations ×2

BEHAVIOURAL

calm under pressurefast triageclear communicationstructured thinkingstakeholder communicationjudgment

Role Details

Seniority mid

Experience 3–5 yrs

Level Mid

Work Mode Hybrid (On-site Tuesdays and Wednesdays)

AI-Extracted Insights

Domain Areas

unified-communications-platformvoip-infrastructuremessaging-brokersstorage-systemscloud-workloadsvoipsip-infrastructure

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about 8x8?

Real rants from real employees. Read before you apply.

Read Company Rants →