Thoughtworks
ServiceReliabilityEngineer
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Service Reliability Engineer at Thoughtworks. Skills: Site Reliability Engineering, Automation, Monitoring, Incident management. Provide operational support. Debug production issues”
What You'll Achieve.
Meet and exceed reliability objectives; Meet and exceed business objectives
Industry & Context.
Diagnose and resolve issues; Troubleshooting; Investigating issues; Root cause analysis
Rotation- and need-based 24x7 available team
What They're Looking For.
Must Have
Hands-on experience in programming and scripting languages, Good understanding of at least one Public Cloud, Familiar with DevOps and GitOps practices, Familiar with creating infrastructure resources, Ability to work in close communication with engineering teams, Willing to be part of a rotation- and need-based 24x7 available team
Nice to Have
Experience in fixing bugs, analyzing logs, building metrics and operational dashboards
What You'll Do.
Provide operational support
Debug production issues
Diagnose and resolve issues
Troubleshoot and investigate
Handle production incidents
Respond and communicate over incidents
Share ideas with team members
Ensure development and maintenance of positive relationships
Adjust and suggest innovative solutions
How You'll Work.
Team & Collaboration
Shared responsibility; Collaborative culture; Internal peers; Other colleagues; Engineering teams
Communication Scope
Incident communication; Communicate technical matters
Full Job Description
As a Consultant Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives. Job responsibilities You will provide operational support for large-scale distributed environments and debug production issues across services and stack levels. You will quickly diagnose and resolve issues to minimize downtime and impact to users. You will do troubleshooting and investigation across databases, web services and applications. You will handle production incidents, managing incident communication with clients and help in drafting RCA documents. You will respond and communicate over incidents (incident management). You will monitor and ensure that technical/business expectations of deliverables are consistently met on projects. You will share ideas with appropriate team members, stakeholders and leaders to facilitate further discussion and exploration. You will ensure the development and maintenance of positive relationships with internal peers and other colleagues, in ways that help to deliver strategic objectives. You will adjust and suggest innovative solutions to current constraints and business policies. Job qualifications Technical Skills You have hands-on experience in programming and scripting languages such as Python, Go or Bash. You have a good understanding of at least one Public Cloud (AWS, Azure, GCP) . You have had exposure to observability too
Applying for this Service Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Thoughtworks?
Real rants from real employees. Read before you apply.