Airbnb

SeniorSoftwareEngineer,ReliabilityEngineering

São Paulo, Brazil

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior Software Engineer, Reliability Engineering at Airbnb. Skills: Site Reliability Engineering, developing and maintaining tools and systems, incident management, distributed systems, cloud computing. developing and maintaining the tools and systems that enable our engineering teams to operate our services reliably and at scale. ensure our services are properly instrumented and able to scale with our growing business”

What You'll Achieve.

ensure our services are properly instrumented and able to scale with our growing business; bolstering our services' reliability; improving how the company manages incidents broadly; ensure that our services remain resilient as our business continues to expand; ensure timely resolution, minimizing the impact on our customers and business

Industry & Context.

Problems you'll solve

Excellent problem-solving and analytical skills

Eligibility Requirements

on-call

What They're Looking For.

Must Have

5+ years of experience in software engineering or SRE roles, coding skills in at least one programming language, such as Java, Python, or Go, Experience with distributed systems and service-oriented architectures, Experience with cloud computing platforms such as AWS or Google Cloud Platform, conviction in software development best practices, including version control, automated testing, and continuous integration and delivery, Experience with containerization technologies such as Docker and Kubernetes, Excellent problem-solving and analytical skills, attention to detail, Ability to work effectively in a fast-paced and dynamic environment, communication and interpersonal skills, Fluent in English (Professional Level)

Nice to Have

Kubernetes a plus

What You'll Do.

developing and maintaining the tools and systems that enable our engineering teams to operate our services reliably and at scale

ensure our services are properly instrumented and able to scale with our growing business

bolstering our services' reliability

improving how the company manages incidents broadly

establish a culture of reliability throughout the organization by providing a comprehensive incident management platform that is being used for instrumentation

identify opportunities for improvement and drive their implementation

ensure that our services remain resilient as our business continues to expand

responding to and managing high severity incidents

serve as an active member of the Production SRE team

implement and maintain the tools and systems that support service reliability

Collaborate with other engineering teams to ensure services are designed with reliability in mind

and provide guidance on the appropriate use of tooling and automation

Identify opportunities to improve the reliability

and efficiency of our services and drive their implementation

Work with infrastructure engineers to understand the challenges they face in operating our services and develop tools and systems to help them manage these challenges

Participate in incident response and post-mortems to identify and address systemic issues

Continuously evaluate new technologies and industry best practices to improve our SRE tooling and incident response procedures

Gain and maintain an intimate understanding of how the critical parts of the site work (services

Lead high-urgency incidents

mentor less-experienced engineers in effectively handling incidents

How You'll Work.

Team & Collaboration

work closely with our SREs and other engineering teams; collaborating closely with other engineering teams; guide cross-functional teams during crisis situations

Communication Scope

communication and interpersonal skills; excellent communication and coordination skills

Full Job Description

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every country across the globe. Every day, hosts offer unique stays and experiences that make it possible for guests to connect with communities in a more authentic way. The Community You Will Join: We are looking for a Senior Software Engineer to join our Site Reliability Engineering team. As a Senior Software Engineer in Production SRE, you will be responsible for developing and maintaining the tools and systems that enable our engineering teams to operate our services reliably and at scale. You will work closely with our SREs and other engineering teams to ensure our services are properly instrumented and able to scale with our growing business. The Difference You Will Make: In this role, your expertise in developing and maintaining tools and systems will be instrumental in bolstering our services' reliability and improving how the company manages incidents broadly. By collaborating closely with other engineering teams you will help establish a culture of reliability throughout the organization by providing a comprehensive incident management platform that is being used for instrumentation, operability, and around incidents. Your ability to identify opportunities for improvement and drive their implementation will contribute significantly to our overall operational efficiency and growth, ensuring that our services remain resilient as our business continues to expand. Additionally, as an essential part of this role, you will serve as an active member of the Production SRE team, responding to and managing high severity incidents. Your vast technical experience and leadership skills will be invaluable as you step into the role of Incident Commander during these critical events. You will guide cross-functional teams during crisis situations and ensure timely resolution, minimizi

Free ATS check

Applying for this Senior Software Engineer, Reliability Engineering role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 23 detected · ranked by frequency

developing and maintaining tools and systems ×5

incident management ×5

instrumentation ×3

operability ×3

monitoring ×3

alerting ×3

automation ×3

version control ×3

automated testing ×3

continuous integration and delivery ×3

containerization ×3

Site Reliability Engineering ×2

distributed systems ×2

cloud computing ×2

Docker ×2

Kubernetes ×2

Java

Python

AWS

Google Cloud Platform

Incident Commander

leadership skills

BEHAVIOURAL

communicationinterpersonal skillsresilience under pressurecommitment to our culture of blamelessness and continuous learning

Role Details

Experience 5–10 yrs

Level Senior

Education Bachelor's degree in Computer Science or related field

Category software-engineering

AI-Extracted Insights

Domain Areas

large-scale-distributed-systemshow-the-critical-parts-of-the-site-work-servicesinfrastructureproducttoolsand-processes

ANONYMOUS · UNFILTERED

What do employees actually say about Airbnb?

Real rants from real employees. Read before you apply.

Read Company Rants →