Roche

Healthcare

AIPlatformEngineer

$550–950k ~AI est. Shanghai, China FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“AI Platform Engineer at Roche. Skills: AI Platform Engineering, MLOps, Infrastructure as Code, Cloud Infrastructure. Own OS baseline. Manage custom Base ISO lifecycle”

Industry & Context.

Healthcare
Problems you'll solve

Troubleshooting; Root cause analysis

What They're Looking For.

Must Have

8+ years Linux systems engineering, 5+ years Kubernetes production, Expert IaC proficiency, Hands-on GPU cluster experience, Networking fundamentals, Deep AWS experience, Helm Chart development, CI/CD pipeline ownership, Business-level English proficiency, Cross-functional collaboration, Lead troubleshooting in real-time, Experience operating AI/ML serving platforms, Service mesh expertise, Full-stack observability design, Production experience multi-cloud orchestration, Familiarity with GxP/CSV compliance, Experience with AI Gateway / LLM routing systems, FinOps practice

Nice to Have

Familiarity with pharmaceutical IT service management, Prior experience platform team serving ML/data science customers

What You'll Do.

Manage custom Base ISO lifecycle

Integrate enterprise storage systems

Select GPU server BOM

Architect cloud resource strategy

Plan Reserved Instances

Optimize costs across clouds

Manage cloud accounts

Develop Ansible scripts

Maintain Ansible scripts

Automate server management

Build AMI Bakery pipelines

Operate AMI Bakery pipelines

Orchestrate multi-cloud deployments

Automate Kubernetes cluster provisioning

Manage Kubernetes clusters

Develop custom IAC Scripts

Harden custom IAC Scripts

Manage full cluster lifecycle

Manage disaster recovery

Own platform components

Engineer AI Workload Orchestration

Engineer Kubernetes Scheduling

Engineer SLURM Scheduling

Engineer Networking connectivity

Design observability dashboards

Implement observability dashboards

Implement dev-sec ops

Build platform engineering

Maintain platform engineering

Troubleshoot platform issues

Lead troubleshooting across services

Execute on-prem model lifecycle

Develop workspace auto provisioning

Maintain workspace auto provisioning

Integrate AI safety guardrails

Implement FinOps process

Author system design documents

Maintain system design documents

Manage documentation workflow

Manage approval workflow

Manage workloads in Jira

How You'll Work.

Team & Collaboration

Work with global teams; Collaborate with architects; Work with stakeholders; Cross-functional collaboration; Drive alignment across groups

Communication Scope

Translate technical concerns; Communicate status clearly

Process & Methodology

Jira

Full Job Description

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters. ### ### The Position AI Platform Engineer # Role Overview Own the full lifecycle of a production AI/ML platform spanning on-prem GPU clusters, multi-cloud infrastructure (AWS, Alibaba Cloud), and service delivery. This role bridges datacenter hardware, platform engineering, and AI service operations in a GxP-regulated pharmaceutical environment. You will work closely with global engineering teams, solution architects, and business stakeholders across time zones. # Key Responsibilities ## Infrastructure Engineering (On-Prem & Cloud) • Own OS baseline: REDHAT Satellite management, custom Base ISO lifecycle • Integration with enterprise storage systems, which are managed by the Roche Storage team • GPU server BOM selection,and hardware qualification • Architect cloud resource strategy: Reserved Instance planning, cost optimization across AWS and Alibaba Cloud • Cloud Accounts (AWS and Alibaba) Post Previsioning, configuration, and management, for Platform and Platform managed use case accounts ## Infrastructure as Code (IaC) • Develop and maintain Ansible scripts for automated server management (Provision, Decommission, Configuration) • Build and operate AMI Bakery pipelines for immutable image delivery • Orchestrate multi-cloud server deployments (AWS, Alibaba Cloud) via IaC • Automate Kubernetes cluster provisioning and management • Develop and harden custom IAC Scripts ## MLOps Platform Engineering • Manage full cluster lifecycle: provisioning, upgrades, scaling, disaster recovery • Own 30+ platform components acr

Free ATS check

Applying for this AI Platform Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about Roche?

Real rants from real employees. Read before you apply.

Read Company Rants →