Recorded Future

Intelligence

SiteReliabilityEngineer

gothenburg, nebraska, united states
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Site Reliability Engineer at Recorded Future. Skills: Site Reliability Engineering, AWS, Automation, Observability. Ensure platform performance, capacity, scalability, reliability, resiliency, security, compliance, support, cost efficiency. Make systemic improvements”

What You'll Achieve.

Ensure reliability, scalability, and performance of critical systems; Reduce system downtime

Industry & Context.

Intelligence
Problems you'll solve

Troubleshooting and diagnostic skills; Problem identification

Eligibility Requirements

Participate in a 24/7 on-call rotation

What They're Looking For.

Must Have

3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role, Extensive hands-on experience with Amazon Web Services (AWS), deep understanding of networking concepts within AWS, Expert-level troubleshooting and diagnostic skills, Proven track record of reducing system downtime, Ability to grasp complex architectures, Advanced Linux skills, proficiency in Terraform and Chef, A preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes, Skilled in creating clear, concise incident reports and technical documentation, Ability to stay calm under pressure during an outage, Fantastic collaboration skills, Spectacular collaborator and communicator, A team player but self motivated

Nice to Have

Knowledge and experience with Kubernetes, Familiarity with message brokers such as RabbitMQ and Apache Kafka, Experience with NoSQL databases, particularly MongoDB and Elasticsearch, Familiarity with OpenTelemetry, Experience with large distributed systems and microservices architecture, Experience with CI/CD pipelines

What You'll Do.

Ensure platform performance

Make systemic improvements

Perform Root Cause Analysis for outages

and maintain infrastructure on AWS

Develop and manage observability solutions

Monitor system health and performance

Automate infrastructure provisioning and configuration

Respond to and resolve production incidents

Ensure applications are designed for high availability and resilience

Identify and address performance bottlenecks

Drive continuous improvement through automation

Conduct post-incident reviews

How You'll Work.

Team & Collaboration

Work closely with development teams; Collaborate with engineering teams; Collaboration skills

Communication Scope

Spectacular communicator

Full Job Description

With 1,000+ intelligence professionals serving over 1,900 clients worldwide, Recorded Future is the world’s most advanced, and largest, intelligence company! Recorded Future is seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our growing team. In this role, you will be instrumental in ensuring the reliability, scalability, and performance of our critical systems. You will work closely with development teams to build and maintain robust infrastructure, implement automation, and foster a culture of operational excellence. This position requires a strong understanding of cloud environments, observability, and infrastructure as code principles. What You'll Do: Ensure the performance, capacity, scalability, reliability, resiliency, security, compliance, support, cost efficiency, SLA, SLOs, RPOs and RTOs for the platform, either directly or in collaboration with other teams. Make systemic improvements both proactively and for recurring issues. Perform comprehensive Root Cause Analysis for outages. Design, implement, and maintain scalable and reliable infrastructure on AWS. Develop and manage observability solutions using tools such as Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance. Automate infrastructure provisioning and configuration using Terraform and Chef. Participate in a 24/7 on-call rotation to respond to and resolve production incidents. Collaborate with engineering teams to ensure applications are designed for high availability and resilience. Proactively identify and address performance bottlenecks and potential issues. Drive continuous improvement through automation, process optimization, and post-incident reviews. What You'll Bring: 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role. Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS. Expert-level troubleshoo

Free ATS check

Applying for this Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Recorded Future?

Real rants from real employees. Read before you apply.

Read Company Rants →