HDR
Technology
PlatformOperations2
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Platform Operations 2 at HDR. Skills: Platform Operations, Observability, Incident Response, Automation. Build dashboards. Maintain dashboards”
What You'll Achieve.
Improve platform stability; Improve platform supportability
Industry & Context.
Root cause analysis; Troubleshooting
On-call flexibility, Remain reachable off-hours
What They're Looking For.
Must Have
3 years of experience in infrastructure monitoring, 3 years of experience in systems operations, 3 years of experience in platform support, 3 years of experience in operational engineering, Experience with observability platforms, Experience with monitoring platforms, Experience supporting incident management processes, Experience supporting operational escalations, Working knowledge of VMware vSphere, Working knowledge of virtual infrastructure concepts, Experience defining service reliability metrics, Experience reporting on service reliability metrics, Experience defining SLAs, Experience reporting on SLAs, Experience defining SLOs, Experience reporting on SLOs, Working knowledge of scripting for automation, Working knowledge of scripting for data collection, Bachelor’s degree in Information Technology, Bachelor’s degree in Computer Science, Bachelor’s degree in Engineering, Bachelor’s degree in related field, Equivalent experience to Bachelor's degree
Nice to Have
Experience with VMware Cloud Foundation Operations, Experience with vRealize Operations, Experience integrating observability platforms, Familiarity with VMware Aria Operations, Familiarity with Aria Operations for Logs, Familiarity with VCF ecosystem tools, Exposure to cloud operations in Azure, Exposure to cloud operations in public cloud, Familiarity with security operations concepts, Familiarity with least privilege, Familiarity with audit logging, Familiarity with compliance evidence collection, ITIL Foundation certification, Service management certification
What You'll Do.
Maintain health checks
Maintain service views
Define operational health indicators
Track operational health indicators
Report on operational health indicators
Act as escalation point
Investigate incidents
Correlate telemetry across tools
Coordinate resolution with teams
Tune alert thresholds
Support integration with Dynatrace
Support integration with ServiceNow
Support integration with enterprise platforms
Conduct post-incident reviews
Drive remediation tasks
Establish performance baselines
Refine performance baselines
Establish threshold models
Refine threshold models
Establish capacity trending reports
Refine capacity trending reports
Contribute to automation of checks
Contribute to automation of alert enrichment
Contribute to automation of reporting
Contribute to automation of remediation workflows
Apply cloud security requirements
Apply compliance requirements
How You'll Work.
Team & Collaboration
Infrastructure teams; Platform teams
Full Job Description
At HDR, our employee-owners are fully engaged in creating a welcoming environment where each of us is valued and respected, a place where everyone is empowered to bring their authentic selves and novel ideas to work every day. As we foster a culture of inclusion throughout our company and within our communities, we constantly ask ourselves: What is our impact on the world? Watch Our Story: ' https://www.hdrinc.com/our-story' Build and maintain dashboards, alerts, health checks, and service views for VCF platform operations. Define, track, and report on SLOs, SLAs, and operational health indicators for core platform services. Act as an escalation point for platform degradation, recurring alerts, and service incidents. Investigate incidents, correlate telemetry across tools, and coordinate resolution with infrastructure and platform teams. Tune alert thresholds and reduce noise through event correlation, dependency awareness, and operational feedback. Support integration of VCF Operations with Dynatrace, ServiceNow, and other enterprise operations platforms. Conduct post-incident reviews and help drive remediation tasks that improve platform stability and supportability. Establish and refine performance baselines, threshold models, and capacity trending reports. Contribute to automation of operational checks, alert enrichment, reporting, and remediation workflows. Apply established cloud security and compliance requirements in monitoring, operational reporting, and access practices. Schedule & Presence: This on-site role supports 24/7 operations through real-time collaboration, standard shifts occur within a 6:00 AM - 6:00 PM window, Monday through Friday. Additionally, this position requires scheduled on-call flexibility and the ability to remain reasonably reachable during off-hours for critical business continuity. Preferred Qualifications Experience with VMware Cloud Foundation Operations / vRealize Operations. Experience integrating observability platforms wit
Applying for this Platform Operations 2 role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Taleo (Oracle)
- Taleo is older software — paste plain text resume content to avoid formatting issues.
- Avoid special characters, tables, and columns in your resume for this ATS.
- The application may time out on inactivity — copy your answers to a text editor as backup.
ANONYMOUS · UNFILTERED
What do employees actually say about HDR?
Real rants from real employees. Read before you apply.