Universal DX
Healthcare
SiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Site Reliability Engineer at Universal DX. Skills: Kubernetes, EKS, AWS, Observability. Own reliability, performance, and uptime. Define and monitor SLIs/SLOs”
Industry & Context.
Troubleshooting; Root cause analysis
What They're Looking For.
Must Have
Bachelor's degree in Computer Science, 5+ years of experience in SRE, Proven experience with Kubernetes, Skills administrating AWS, Expertise with observability tools, Proficiency with Infrastructure as Code, Programming or scripting ability
Nice to Have
Experience leading SRE practices, Familiarity with CI/CD systems, AWS Certified Solutions Architect
What You'll Do.
Define and monitor SLIs/SLOs
Lead incident response
Perform root cause analysis
Implement long-term fixes
Enhance observability
Automate operational tasks
Plan infrastructure changes
Execute infrastructure changes
Implement scaling strategies
Implement safer rollout processes
Collaborate with Dev teams
Collaborate with Cloud teams
Collaborate with Security teams
Improve operational practices
Improve platform maturity
How You'll Work.
Team & Collaboration
Dev teams; Cloud teams; Security teams
Communication Scope
Communication skills
Process & Methodology
GitOps workflows
Full Job Description
About our Company: Universal DX, Inc. is an international Company with a highly experienced team focused on cracking cancer’s code. Through our multi-omics and bioinformatics models, we have figured out how to read the disease’s signals in blood with high accuracy to detect cancer in its earliest stages. Starting with a colorectal cancer screening liquid biopsy test, we are building a multi-cancer platform that can identify the unique DNA regions associated with different types of cancers. The Opportunity: Universal DX is seeking an experienced Site Reliability Engineer to join our growing team. You will be a key technical leader responsible for the reliability, scalability, and operational excellence of our production platforms, with a strong focus on EKS/Kubernetes and cloud infrastructure. You will be part of a team that is passionate about developing novel diagnostic tests for the early detection of cancers. As part of the team, you will be in a Company that aims more than to become one of the leaders in the industry. We want to have a huge positive impact on society by achieving the ambitious purpose of “making cancer a curable disease by detecting it earlier”. How you’ll contribute: Own the reliability, performance, and uptime of Kubernetes (EKS) clusters and shared platform services. Define and monitor service-level indicators and objectives (SLIs/SLOs) that drive reliability decisions. Lead incident response, root cause analysis, and implement long-term fixes. Build and enhance observability (monitoring, logging, tracing) to surface issues before they impact customers. Automate operational tasks and reduce toil using software engineering principles. Plan and execute safe infrastructure changes, including cluster upgrades, scaling strategies, and safer rollout processes. Collaborate with Dev, Cloud, and Security teams to improve operational practices and platform maturity. Mentor and coach less senior engineers on reliability engineering best pract
Applying for this Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Universal DX?
Real rants from real employees. Read before you apply.