Eli Lilly and Company

Healthcare

PrincipalPlatformReliabilityEngineer

$126–224k Indianapolis, Indiana, United States FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Principal Platform Reliability Engineer at Eli Lilly and Company. Skills: Platform Reliability, Site Reliability Engineering, Cloud Environments, Observability. Define and implement SLOs. Implement reliability standards”

Industry & Context.

Healthcare
Problems you'll solve

Root cause analysis; Troubleshooting complex issues; Troubleshooting performance issues

Eligibility Requirements

In office 3 days a week

What They're Looking For.

Must Have

Bachelor's degree in Computer Science, 7+ years of hands-on experience with AWS, Extensive experience with Kubernetes, Experience operating distributed systems, Experience in incident management, Experience defining SLOs, Hands-on experience with observability tools, Experience building CI/CD pipelines, Proficient Experience in Infrastructure as Code tools, Experience with scripting in Python, Experience with networking fundamentals, Experience with cloud architecture fundamentals, Experience implementing security best practices, Experience troubleshooting complex issues

Nice to Have

Experience with ArgoCD, Experience with GitHub Actions, Familiarity with large-scale enterprise platforms, Experience in regulated industries, Exposure to global support models, Written communication skills

What You'll Do.

Define and implement SLOs

Implement reliability standards

Drive resilience through capacity planning

Design failover strategies

Design disaster recovery strategies

Lead response for P1/P2 incidents

Mitigate incidents rapidly

Recover systems rapidly

Conduct root cause analysis

Implement corrective actions

Develop operational standards

Maintain operational standards

Implement observability frameworks

Optimize observability frameworks

Improve system visibility

Leverage platforms for telemetry

Build CI/CD pipelines

Maintain CI/CD pipelines

Drive adoption of IaC

Drive adoption of GitOps

Support SRE principles integration

Implement secure-by-design practices

Support vulnerability remediation

Ensure secure configurations

Align with security standards

Align with compliance standards

Partner with engineering teams

Improve platform reliability

Improve platform performance

Improve deployment practices

Provide technical guidance

Provide mentorship to engineers

Communicate system health

Communicate incident impact

How You'll Work.

Team & Collaboration

Partner with engineering teams; Technical guidance to engineers; Communicate with stakeholders

Communication Scope

Incident updates; Postmortems; Status summaries

Process & Methodology

GitOps practices

Full Job Description

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world. Eli Lilly and Company seeks a Platform Site Reliability Engineer to join the Software Product Engineering (SPE) Customer Operations team. You will design, operate, and continuously improve highly available, scalable, and fault-tolerant systems across cloud environments. You will play a critical role in establishing reliability standards, driving operational excellence, and enabling engineering teams to build and deploy with confidence. **What You’ll Do:** * Define and implement SLOs, SLIs, and reliability standards that establish a consistent foundation for platform health, driving resilience through capacity planning, failover design, and disaster recovery strategies * Lead response for P1/P2 incidents, owning rapid mitigation and recovery while conducting thorough root cause analysis and implementing corrective actions that prevent recurrence * Develop and maintain runbooks, playbooks, and operational standards that enable the broader engineering organization to respond effectively and consistently * Implement and optimize observability frameworks spanning monitoring, logging, tracing, and alerting — improving system visibility and reducing alert noise through actionable, signal-driven insights * Leverage platforms such as Splunk, Prometheus, CloudWatch, or equivalent tooling to ensure teams have the telemetry they need to detect, diagnose, and resolve issues proactively * Build and maintain CI/CD pipelines and deployment au

Free ATS check

Applying for this Principal Platform Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Eli Lilly and Company?

Real rants from real employees. Read before you apply.

Read Company Rants →