Roche
Healthcare
AIPlatformEngineer
Neural analysis suggests this role is
optimal for Mid+ candidates.
“AI Platform Engineer at Roche. Skills: AI Platform Engineering, MLOps, Infrastructure as Code, Cloud Infrastructure. Own OS baseline. Manage custom Base ISO lifecycle”
Industry & Context.
Troubleshooting; Root cause analysis
What They're Looking For.
Must Have
8+ years Linux systems engineering, 5+ years Kubernetes production, Expert IaC proficiency, Hands-on GPU cluster experience, Networking fundamentals, Deep AWS experience, Helm Chart development, CI/CD pipeline ownership, Business-level English proficiency, Cross-functional collaboration, Lead troubleshooting in real-time, Experience operating AI/ML serving platforms, Service mesh expertise, Full-stack observability design, Production experience multi-cloud orchestration, Familiarity with GxP/CSV compliance, Experience with AI Gateway / LLM routing systems, FinOps practice
Nice to Have
Familiarity with pharmaceutical IT service management, Prior experience platform team serving ML/data science customers
What You'll Do.
Manage custom Base ISO lifecycle
Integrate enterprise storage systems
Select GPU server BOM
Architect cloud resource strategy
Plan Reserved Instances
Optimize costs across clouds
Manage cloud accounts
Develop Ansible scripts
Maintain Ansible scripts
Automate server management
Build AMI Bakery pipelines
Operate AMI Bakery pipelines
Orchestrate multi-cloud deployments
Automate Kubernetes cluster provisioning
Manage Kubernetes clusters
Develop custom IAC Scripts
Harden custom IAC Scripts
Manage full cluster lifecycle
Manage disaster recovery
Own platform components
Engineer AI Workload Orchestration
Engineer Kubernetes Scheduling
Engineer SLURM Scheduling
Engineer Networking connectivity
Design observability dashboards
Implement observability dashboards
Implement dev-sec ops
Build platform engineering
Maintain platform engineering
Troubleshoot platform issues
Lead troubleshooting across services
Execute on-prem model lifecycle
Develop workspace auto provisioning
Maintain workspace auto provisioning
Integrate AI safety guardrails
Implement FinOps process
Author system design documents
Maintain system design documents
Manage documentation workflow
Manage approval workflow
Manage workloads in Jira
How You'll Work.
Team & Collaboration
Work with global teams; Collaborate with architects; Work with stakeholders; Cross-functional collaboration; Drive alignment across groups
Communication Scope
Translate technical concerns; Communicate status clearly
Process & Methodology
Jira
Full Job Description
At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters. ### ### The Position AI Platform Engineer # Role Overview Own the full lifecycle of a production AI/ML platform spanning on-prem GPU clusters, multi-cloud infrastructure (AWS, Alibaba Cloud), and service delivery. This role bridges datacenter hardware, platform engineering, and AI service operations in a GxP-regulated pharmaceutical environment. You will work closely with global engineering teams, solution architects, and business stakeholders across time zones. # Key Responsibilities ## Infrastructure Engineering (On-Prem & Cloud) • Own OS baseline: REDHAT Satellite management, custom Base ISO lifecycle • Integration with enterprise storage systems, which are managed by the Roche Storage team • GPU server BOM selection,and hardware qualification • Architect cloud resource strategy: Reserved Instance planning, cost optimization across AWS and Alibaba Cloud • Cloud Accounts (AWS and Alibaba) Post Previsioning, configuration, and management, for Platform and Platform managed use case accounts ## Infrastructure as Code (IaC) • Develop and maintain Ansible scripts for automated server management (Provision, Decommission, Configuration) • Build and operate AMI Bakery pipelines for immutable image delivery • Orchestrate multi-cloud server deployments (AWS, Alibaba Cloud) via IaC • Automate Kubernetes cluster provisioning and management • Develop and harden custom IAC Scripts ## MLOps Platform Engineering • Manage full cluster lifecycle: provisioning, upgrades, scaling, disaster recovery • Own 30+ platform components acr
Applying for this AI Platform Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Workday
- Workday has a multi-step form — save your progress after every section.
- "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
- Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
- Job requisition numbers are useful when following up with HR by email.
ANONYMOUS · UNFILTERED
What do employees actually say about Roche?
Real rants from real employees. Read before you apply.