Hewlett Packard Enterprise

SiteReliabilityEngineer

$155–306k Wroclaw, Lower Silesian, Poland FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Site Reliability Engineer at Hewlett Packard Enterprise. Skills: Site Reliability Engineering, AWS, Infrastructure Automation, Kubernetes, Distributed Systems, Observability, Python, Golang. Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation, and refinement.. Support development of services from planning phase before they go live through activities such as system design consulting, developing software platforms and frameworks, capa”

Industry & Context.

Problems you'll solve

problem-solving and debugging skills with a high sense of ownership.; Troubleshooting skills across network, application, and distributed services layers.

Eligibility Requirements

Be on an on-call rotation to respond to incidents that impact platform availability.

What They're Looking For.

Must Have

Experience building and running reliable and fault-tolerant production cloud systems at scale on AWS., Coding infrastructure automation with Terraform, Terragrunt, Packer, CI/CD, and knowing how to use configuration management systems like Ansible., Hands-on experience with Linux/Unix operating systems internals, file systems, system tuning, administration, and networking., Deep experience in microservice technologies, container orchestration, and continuous deployment (Kubernetes, Docker, Helm, GitOps with Flux)., Experience in designing, building, maintaining production services, and troubleshooting large-scale distributed systems., Experience with technologies like Apache Kafka, Apache Storm, Apache Flink, Apache Airflow and Spark, Postgres, Redis, Elasticsearch, Arango, Cassandra., Experience with observability tools and methodology (monitoring, logging, tracing, SLOs/SLIs) for detecting and diagnosing issues in advance before causing service impact or performance degradation., Possess programming skills in Shell, Python, Golang and/or Ruby., Deliver efficiently and effectively., problem-solving and debugging skills with a high sense of ownership.

Nice to Have

10+ years of engineering or systems experience., Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns., understanding of network design and architecture., Scaling and managing distributed systems., Significant experience with monitoring and observability platforms., Demonstrated ability to debug, fix, and optimize code., Troubleshooting skills across network, application, and distributed services layers., The ability to learn quickly and adapt to new technologies is essential., Excellent communications skills, both verbal and written.

What You'll Do.

Engage in and improve the whole lifecycle of services - from inception and design

through to deployment

Support development of services from planning phase before they go live through activities such as system design consulting

developing software platforms and frameworks

capacity planning and launch reviews.

Maintain services once they are living by measuring and monitoring availability

and overall system health.

Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Capacity planning the growth of cloud infrastructure.

Improve operational processes such as deployments and upgrades.

Manage execution of project priorities

Be on an on-call rotation to respond to incidents that impact platform availability.

Use your on-call shift to prevent incidents from happening.

Experience in incident response

including conducting post-mortems and implementing lessons learned

enhances system reliability.

How You'll Work.

Team & Collaboration

Provide technical leadership and guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.

Communication Scope

Excellent communications skills, both verbal and written.

Process & Methodology

Manage execution of project priorities, deadlines, and deliverables.

Full Job Description

Site Reliability Engineer This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home. **Who We Are:** Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE. **Job Description:** We are looking for a highly motivated, self-driven, and dedicated Site Reliability Engineer possessing hands-on experience with: • Experience building and running reliable and fault-tolerant production cloud systems at scale on AWS. • Coding infrastructure automation with Terraform, Terragrunt, Packer, CI/CD, and knowing how to use configuration management systems like Ansible. • Hands-on experience with Linux/Unix operating systems internals, file systems, system tuning, administration, and networking. • Deep experience in microservice technologies, container orchestration, and continuous deployment (Kubernetes, Docker, Helm, GitOps with Flux). • Experience in designing, building, maintaining production services, and troubleshooting large-scale distributed systems. • Experience with technologies like Apache Kafka, Apache Storm, Apache Flink, Apache Airflow and Spark, Postgres, Redis, Elasticsearch, Arango, Cassandra. • Experience with observability tools and methodology (monitoring, logging, tracing, SLOs/SLIs) for detecting and diagnosing issues in advance before causing service impact or performance degradation. • Possess st

Free ATS check

Applying for this Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Hewlett Packard Enterprise?

Real rants from real employees. Read before you apply.

Read Company Rants →