HelloKindred
Information Technology and Services
PlatformSREEngineer
“Platform - SRE Engineer at HelloKindred. Skills: Platform Engineering, Site Reliability Engineering, Cloud Infrastructure Management, CI/CD, Observability, AI/ML Workload Support. own deployment, observability, reliability, cost control, and production operations for an AI helpdesk platform. support the design, deployment, and operational management of AI services and production environments”
Industry & Context.
troubleshooting, operational support, and incident response capabilities
BPSS Clearance, Candidates must be legally authorized to live and work in the country where the position is based, without requiring employer sponsorship.
What They're Looking For.
Must Have
experience in DevOps and Site Reliability Engineering environments, Experience with Docker, Kubernetes, cloud platforms, and Infrastructure as Code practices, experience with monitoring, observability, and operational tooling, Familiarity with CI/CD pipelines, release automation, secrets management, and production support processes, Understanding of LLM deployment patterns and API-based model integrations, Experience working with cloud platforms, particularly AWS, Experience using Jira, Confluence, and ServiceNow, troubleshooting, operational support, and incident response capabilities, communication and collaboration skills within cross-functional engineering teams
Nice to Have
Experience supporting AI/ML workloads in production environments, Experience with GPU workloads, autoscaling, and cost optimization
What You'll Do.
and production operations for an AI helpdesk platform
and operational management of AI services and production environments
performance optimization
and operational resilience across cloud-based infrastructure
Build and manage CI/CD pipelines
and runtime environments for AI services
Deploy and operate model-serving
and application workloads
and operational dashboards
Manage scaling activities
and production support operations
Optimize inference cost
and overall system reliability
operational standards
and incident response processes
Support infrastructure automation and platform engineering initiatives
Maintain observability and monitoring solutions across production environments
Support release automation
and production operational processes
Troubleshoot production issues and support system diagnostics and remediation activities
Ensure platform stability
and performance across cloud-native environments
How You'll Work.
Team & Collaboration
Collaborate with engineering teams to support AI platform reliability and operational readiness; communication and collaboration skills within cross-functional engineering teams
Communication Scope
communication and collaboration skills within cross-functional engineering teams
Applying for this Platform - SRE Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on SmartRecruiters
- SmartRecruiters often includes a video screening step — check camera and mic permissions.
- Link your GitHub or portfolio directly in the profile section for technical roles.
- Applications may be reviewed by AI scoring before reaching a recruiter — use keywords from the job description.
ANONYMOUS · UNFILTERED
What do employees actually say about HelloKindred?
Real rants from real employees. Read before you apply.