Arista Networks
Networking
SeniorSiteReliabilityEngineer
Neural analysis suggests this role is
optimal for mid candidates.
“Senior Site Reliability Engineer at Arista Networks. Skills: Site Reliability Engineering, Production Systems Operations, Automation, Infrastructure-as-Code, Observability, Incident Response, Cloud Platforms. Design, build, and deploy production systems with a focus on scalability, reliability, observability, and performance. Ensure systems meet stringent security standards”
What You'll Achieve.
Ensure systems meet stringent security standards; Eliminate toil; Streamline operational efficiency; Minimise downtime; Prevent recurrence of incidents; Enhance product deployment workflows; Ensure comprehensive visibility across all systems; Minimal disruption to service availability; Maintain secure, scalable, and fault-tolerant systems
Industry & Context.
Problem-solving skills; Software troubleshooting skills; Methodical, analytical approach; Resolve infrastructural bottlenecks; Enhance troubleshooting capabilities; Accelerate issue resolution
On-call support experience
What They're Looking For.
Must Have
5+ years in a related infrastructure or systems role, Proficiency in one or more programming languages: Go, Python, or bash shell scripting, with the ability to implement medium-complexity automation workflows, Knowledge of Linux or UNIX from both administration and debugging perspectives, Hands-on experience operating software systems, infrastructure, and complex applications at scale in production environments, Demonstrated expertise in infrastructure-as-code principles and practices, Problem-solving and software troubleshooting skills with a methodical, analytical approach, Experience with server provisioning, particularly from storage and networking perspectives, Proven ability to work collaboratively within cross-functional teams and communicate technical concepts clearly, Experience with incident response, postmortem analysis, and continuous improvement methodologies
Nice to Have
Experience with container orchestration platforms, particularly Kubernetes, Hands-on experience with Docker and virtualisation technologies, Proficiency in managing monitoring stacks, including Prometheus and Grafana, Experience with CI/CD systems such as GitLab tools or Spinnaker, Knowledge of infrastructure-as-code frameworks, particularly Terraform, Experience managing databases such as PostgreSQL or equivalent relational database management systems, Experience with artifact repositories and Docker registries, Familiarity with cloud platforms (Google Cloud Platform, Amazon Web Services, or Microsoft Azure), Understanding of distributed systems architecture and principles, Experience with performance tuning and system optimisation, Knowledge of security best practices in infrastructure and systems design, On-call support experience and comfort with incident response responsibilities
What You'll Do.
and deploy production systems with a focus on scalability
Ensure systems meet stringent security standards
Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency
Proactively monitor production systems
Establish intelligent alerting strategies
Implement automated incident response mechanisms
Create and maintain detailed incident response plans
Conduct thorough postmortem analyses following incidents
Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks
Design innovative solutions that enhance product deployment workflows
Manage and optimise monitoring infrastructure
and execute maintenance windows on production systems
Triage platform and infrastructural issues with decisiveness and analytical rigor
Engage with third-party vendors and support teams as required
Deploy new systems and updates in a staged
Survey and adopt best practices in infrastructure and platform management
Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution
How You'll Work.
Team & Collaboration
Work collaboratively with cross-functional teams; Collaborate with software engineering teams; Work transparently with stakeholders to communicate system status, planned maintenance, and infrastructure improvements
Communication Scope
Communicate technical concepts clearly; Communicate system status; Communicate planned maintenance; Communicate infrastructure improvements
Process & Methodology
Plan maintenance windows, Communicate maintenance windows, Execute maintenance windows, Risk-managed rollouts
Full Job Description
Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. Arista is a well-established and profitable company with over $8 billion in revenue. Arista’s award-winning platforms, ranging in Ethernet speeds up to 800G bits per second, redefine scalability, agility, and resilience. Arista is a founding member of the Ultra Ethernet consortium. We have shipped over 20 million cloud networking ports worldwide with CloudVision and EOS, an advanced network operating system. Arista is committed to open standards, and its products are available worldwide directly and through partners. At Arista, we value the diversity of thought and perspectives each employee brings. We believe fostering an inclusive environment where individuals from various backgrounds and experiences feel welcome is essential for driving creativity and innovation. Our commitment to excellence has earned us several prestigious awards, such as the Great Place to Work Survey for Best Engineering Team and Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest quality and performance standards in everything we do. Who You'll Work For We are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent, remote basis from Ireland. In this role, you will be instrumental in building, deploying, and operating critical production systems with a steadfast commitment to scalability, reliability, observability, and security. You will work collaboratively with cross-functional teams to ensure our infrastructure remains resilient, efficient, and future-ready. This is an excellent opportunity for a detail-oriented professional who thrives in a dynamic environment and is passionate about solving complex infrastructure challenges. What You'll Do * Design, build, and deploy production systems with a
Applying for this Senior Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on SmartRecruiters
- SmartRecruiters often includes a video screening step — check camera and mic permissions.
- Link your GitHub or portfolio directly in the profile section for technical roles.
- Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.
ANONYMOUS · UNFILTERED
What do employees actually say about Arista Networks?
Real rants from real employees. Read before you apply.