Company

Technology

StaffMachineLearningSystemsEngineer(MLOps)

€95–145k ~AI est. Bulgaria FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Staff Machine Learning Systems Engineer (MLOps). Skills: MLOps, ML systems engineering, Kubernetes, Infrastructure-as-code. Lead design of ML infrastructure platform. Lead evolution of ML infrastructure platform”

What You'll Achieve.

Improve performance; Improve cost efficiency; Improve deployment speed

Industry & Context.

Technology
Problems you'll solve

Root cause analysis

What They're Looking For.

Must Have

8+ years of experience in platform engineering, DevOps, SRE, or infrastructure roles, Hands-on ML/AI systems experience, Kubernetes (preferably EKS) expertise, Proficiency in infrastructure-as-code tools such as Terraform, Solid programming skills in Python, Experience building infrastructure tooling and automation systems, Experience operating LLM or ML inference systems in production, Hands-on experience with observability stacks, Understanding of CI/CD systems, Understanding of GitOps workflows, Understanding of developer platform engineering, Experience designing IAM, OIDC, and secrets management systems, Systems-thinking mindset, Ability to collaborate across engineering, ML, security, and product teams

Nice to Have

Experience in regulated or high-compliance environments (healthcare, fintech, or similar) is a plus

What You'll Do.

Lead design of ML infrastructure platform

Lead evolution of ML infrastructure platform

Lead operation of ML infrastructure platform

Support AI workloads across production systems

Ensure scalability across environments

Ensure reliability across environments

Ensure security across environments

Own Kubernetes-based infrastructure

Optimize Kubernetes-based infrastructure

Manage autoscaling for ML systems

Manage workload orchestration for ML systems

Manage cluster lifecycle for ML systems

Build GitOps-based CI/CD pipelines

Maintain GitOps-based CI/CD pipelines

Enable efficient deployment of AI services

Design model serving infrastructure

Implement model serving infrastructure

Design inference infrastructure

Implement inference infrastructure

Support multi-provider integrations

Develop observability systems for AI workloads

Develop tracing systems for AI workloads

Develop monitoring systems for AI workloads

Define SLOs for ML systems

Enforce SLOs for ML systems

Define incident response processes

Enforce incident response processes

Define reliability standards for ML systems

Enforce reliability standards for ML systems

Own infrastructure-as-code

Improve developer velocity

Drive security architecture

Drive IAM architecture

Drive secrets management architecture

Ensure least-privilege access

Ensure data protection standards

Translate research into production-ready systems

Translate prototypes into production-ready systems

Identify platform bottlenecks

Lead initiatives to improve performance

Lead initiatives to improve cost efficiency

Lead initiatives to improve deployment speed

Provide technical leadership

Provide architectural guidance

How You'll Work.

Team & Collaboration

Collaborate with ML teams; Collaborate with product teams; Collaborate with data teams; Collaborate with engineering teams; Collaborate with security teams

Process & Methodology

Roadmap planning

Full Job Description

## Accountabilities Lead the design, evolution, and operation of the core ML infrastructure platform supporting AI workloads across production systems, ensuring scalability, reliability, and security across environments. Own and optimize Kubernetes-based infrastructure (e.g., EKS), including autoscaling, workload orchestration, and cluster lifecycle management for ML and AI systems Build and maintain GitOps-based CI/CD pipelines enabling safe, repeatable, and efficient deployment of AI services across environments Design and implement model serving and inference infrastructure, including LLM routing, API gateways, and multi-provider integrations Develop observability, tracing, and monitoring systems for AI workloads using tools such as OpenTelemetry, Datadog, and LLM tracing platforms Define and enforce SLOs, incident response processes, and reliability standards for ML systems in production Own infrastructure-as-code and platform tooling (Terraform, CLIs, internal frameworks) to improve developer velocity and consistency Drive security, IAM, and secrets management architecture ensuring compliance, least-privilege access, and data protection standards Collaborate with ML, product, and data teams to translate research and prototypes into production-ready systems Identify platform bottlenecks and lead initiatives to improve performance, cost efficiency, and deployment speed Provide technical leadership, mentorship, and architectural guidance across ML systems engineering initiatives Requirements: This role requires deep expertise in cloud infrastructure, ML systems, and production-grade platform engineering, with a strong focus on reliability, scalability, and security. 8+ years of experience in platform engineering, DevOps, SRE, or infrastructure roles, including hands-on ML/AI systems experience Strong expertise with Kubernetes (preferably EKS), including cluster operations, autoscaling, and workload orchestration Proficiency in infrastructure-as-code tools such as

Free ATS check

Applying for this Staff Machine Learning Systems Engineer (MLOps) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Lever

  • Lever uses a streamlined one-page form — apply in under 5 minutes.
  • LinkedIn import works well; review parsed data before submitting.
  • The cover letter field is optional but visible to reviewers — use it to differentiate.
  • Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →