Arista Networks

networking

SeniorSiteReliabilityEngineer

Ireland FULL TIME Remote Friendly
The Brief

“Senior Site Reliability Engineer at Arista Networks. Skills: Site Reliability Engineering, production systems operation, automation, observability, scalability, reliability, security. Design, build, and deploy production systems with a focus on scalability, reliability, observability, and performance, ensuring systems meet stringent security standards. Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments”

What You'll Achieve.

ensure our infrastructure remains resilient, efficient, and future-ready; eliminate toil and streamline operational efficiency across production environments; minimise downtime; prevent recurrence of incidents; enhance product deployment workflows; ensuring comprehensive visibility across all systems; minimal disruption to service availability; maintain secure, scalable, and fault-tolerant systems

Industry & Context.

networking
Problems you'll solve

problem-solving and software troubleshooting skills with a methodical, analytical approach; Triage platform and infrastructural issues with decisiveness and analytical rigor

Eligibility Requirements

On-call support experience

What They're Looking For.

Must Have

5+ years in a related infrastructure or systems role, Proficiency in one or more programming languages: Go, Python, or bash shell scripting, with the ability to implement medium-complexity automation workflows, Knowledge of Linux or UNIX from both administration and debugging perspectives, Hands-on experience operating software systems, infrastructure, and complex applications at scale in production environments, Demonstrated expertise in infrastructure-as-code principles and practices, Problem-solving and software troubleshooting skills with a methodical, analytical approach, Experience with server provisioning, particularly from storage and networking perspectives, Proven ability to work collaboratively within cross-functional teams and communicate technical concepts clearly, Experience with incident response, postmortem analysis, and continuous improvement methodologies

Nice to Have

Experience with container orchestration platforms, particularly Kubernetes, Hands-on experience with Docker and virtualisation technologies, Proficiency in managing monitoring stacks, including Prometheus and Grafana, Experience with CI/CD systems such as GitLab tools or Spinnaker, Knowledge of infrastructure-as-code frameworks, particularly Terraform, Experience managing databases such as PostgreSQL or equivalent relational database management systems, Experience with artifact repositories and Docker registries, Familiarity with cloud platforms (Google Cloud Platform, Amazon Web Services, or Microsoft Azure), Understanding of distributed systems architecture and principles, Experience with performance tuning and system optimisation, Knowledge of security best practices in infrastructure and systems design, On-call support experience and comfort with incident response responsibilities

What You'll Do.

and deploy production systems with a focus on scalability

ensuring systems meet stringent security standards

Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments

Proactively monitor production systems

establish intelligent alerting strategies

and implement automated incident response mechanisms to minimise downtime

Create and maintain detailed incident response plans

Conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrence

Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks

designing innovative solutions that enhance product deployment workflows

Manage and optimise monitoring infrastructure using industry-standard tools

ensuring comprehensive visibility across all systems

and execute maintenance windows on production systems with minimal disruption to service availability

Triage platform and infrastructural issues with decisiveness and analytical rigor

Engage with third-party vendors and support teams as required

Deploy new systems and updates in a staged

ensuring safe and incremental rollouts

Survey and adopt best practices in infrastructure and platform management to maintain secure

and fault-tolerant systems

Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution

Work transparently with stakeholders to communicate system status

and infrastructure improvements

How You'll Work.

Team & Collaboration

Work collaboratively with cross-functional teams; Collaborate with software engineering teams; Proven ability to work collaboratively within cross-functional teams; Work transparently with stakeholders

Communication Scope

communicate technical concepts clearly

Process & Methodology

Plan, communicate, and execute maintenance windows

Free ATS check

Applying for this Senior Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on SmartRecruiters

  • SmartRecruiters often includes a video screening step — check camera and mic permissions.
  • Link your GitHub or portfolio directly in the profile section for technical roles.
  • Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.

ANONYMOUS · UNFILTERED

What do employees actually say about Arista Networks?

Real rants from real employees. Read before you apply.

Read Company Rants →