HelloKindred

Information Technology and Services

PlatformSREEngineer

Sheffield, United Kingdom CONTRACT
The Brief

“Platform - SRE Engineer at HelloKindred. Skills: Platform Engineering, Site Reliability Engineering, Cloud Infrastructure Management, CI/CD, Observability, AI/ML Workload Support. own deployment, observability, reliability, cost control, and production operations for an AI helpdesk platform. support the design, deployment, and operational management of AI services and production environments”

Industry & Context.

Information Technology and Services
Problems you'll solve

troubleshooting, operational support, and incident response capabilities

Eligibility Requirements

BPSS Clearance, Candidates must be legally authorized to live and work in the country where the position is based, without requiring employer sponsorship.

What They're Looking For.

Must Have

experience in DevOps and Site Reliability Engineering environments, Experience with Docker, Kubernetes, cloud platforms, and Infrastructure as Code practices, experience with monitoring, observability, and operational tooling, Familiarity with CI/CD pipelines, release automation, secrets management, and production support processes, Understanding of LLM deployment patterns and API-based model integrations, Experience working with cloud platforms, particularly AWS, Experience using Jira, Confluence, and ServiceNow, troubleshooting, operational support, and incident response capabilities, communication and collaboration skills within cross-functional engineering teams

Nice to Have

Experience supporting AI/ML workloads in production environments, Experience with GPU workloads, autoscaling, and cost optimization

What You'll Do.

and production operations for an AI helpdesk platform

and operational management of AI services and production environments

performance optimization

and operational resilience across cloud-based infrastructure

Build and manage CI/CD pipelines

and runtime environments for AI services

Deploy and operate model-serving

and application workloads

and operational dashboards

Manage scaling activities

and production support operations

Optimize inference cost

and overall system reliability

operational standards

and incident response processes

Support infrastructure automation and platform engineering initiatives

Maintain observability and monitoring solutions across production environments

Support release automation

and production operational processes

Troubleshoot production issues and support system diagnostics and remediation activities

Ensure platform stability

and performance across cloud-native environments

How You'll Work.

Team & Collaboration

Collaborate with engineering teams to support AI platform reliability and operational readiness; communication and collaboration skills within cross-functional engineering teams

Communication Scope

communication and collaboration skills within cross-functional engineering teams

Free ATS check

Applying for this Platform - SRE Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on SmartRecruiters

  • SmartRecruiters often includes a video screening step — check camera and mic permissions.
  • Link your GitHub or portfolio directly in the profile section for technical roles.
  • Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.

ANONYMOUS · UNFILTERED

What do employees actually say about HelloKindred?

Real rants from real employees. Read before you apply.

Read Company Rants →