Thoughtworks

ServiceReliabilityEngineer

S$130–195k ~AI est. Singapore, Singapore
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Service Reliability Engineer at Thoughtworks. Skills: Site Reliability Engineering, Automation, Monitoring, Incident management. Provide operational support. Debug production issues”

What You'll Achieve.

Meet and exceed reliability objectives; Meet and exceed business objectives

Industry & Context.

Problems you'll solve

Diagnose and resolve issues; Troubleshooting; Investigating issues; Root cause analysis

Eligibility Requirements

Rotation- and need-based 24x7 available team

What They're Looking For.

Must Have

Hands-on experience in programming and scripting languages, Good understanding of at least one Public Cloud, Familiar with DevOps and GitOps practices, Familiar with creating infrastructure resources, Ability to work in close communication with engineering teams, Willing to be part of a rotation- and need-based 24x7 available team

Nice to Have

Experience in fixing bugs, analyzing logs, building metrics and operational dashboards

What You'll Do.

Provide operational support

Debug production issues

Diagnose and resolve issues

Troubleshoot and investigate

Handle production incidents

Respond and communicate over incidents

Share ideas with team members

Ensure development and maintenance of positive relationships

Adjust and suggest innovative solutions

How You'll Work.

Team & Collaboration

Shared responsibility; Collaborative culture; Internal peers; Other colleagues; Engineering teams

Communication Scope

Incident communication; Communicate technical matters

Full Job Description

As a Consultant Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives. Job responsibilities You will provide operational support for large-scale distributed environments and debug production issues across services and stack levels. You will quickly diagnose and resolve issues to minimize downtime and impact to users. You will do troubleshooting and investigation across databases, web services and applications. You will handle production incidents, managing incident communication with clients and help in drafting RCA documents. You will respond and communicate over incidents (incident management). You will monitor and ensure that technical/business expectations of deliverables are consistently met on projects. You will share ideas with appropriate team members, stakeholders and leaders to facilitate further discussion and exploration. You will ensure the development and maintenance of positive relationships with internal peers and other colleagues, in ways that help to deliver strategic objectives. You will adjust and suggest innovative solutions to current constraints and business policies. Job qualifications Technical Skills You have hands-on experience in programming and scripting languages such as Python, Go or Bash. You have a good understanding of at least one Public Cloud (AWS, Azure, GCP) . You have had exposure to observability too

Free ATS check

Applying for this Service Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

ANONYMOUS · UNFILTERED

What do employees actually say about Thoughtworks?

Real rants from real employees. Read before you apply.

Read Company Rants →