Recorded Future
Intelligence
SiteReliabilityEngineer
Neural analysis suggests this role is
optimal for Mid candidates.
“Site Reliability Engineer at Recorded Future. Skills: Site Reliability Engineering, AWS, Automation, Observability. Ensure platform performance, capacity, scalability, reliability, resiliency, security, compliance, support, cost efficiency. Make systemic improvements”
What You'll Achieve.
Ensure reliability, scalability, and performance of critical systems; Reduce system downtime
Industry & Context.
Troubleshooting and diagnostic skills; Problem identification
Participate in a 24/7 on-call rotation
What They're Looking For.
Must Have
3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role, Extensive hands-on experience with Amazon Web Services (AWS), deep understanding of networking concepts within AWS, Expert-level troubleshooting and diagnostic skills, Proven track record of reducing system downtime, Ability to grasp complex architectures, Advanced Linux skills, proficiency in Terraform and Chef, A preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes, Skilled in creating clear, concise incident reports and technical documentation, Ability to stay calm under pressure during an outage, Fantastic collaboration skills, Spectacular collaborator and communicator, A team player but self motivated
Nice to Have
Knowledge and experience with Kubernetes, Familiarity with message brokers such as RabbitMQ and Apache Kafka, Experience with NoSQL databases, particularly MongoDB and Elasticsearch, Familiarity with OpenTelemetry, Experience with large distributed systems and microservices architecture, Experience with CI/CD pipelines
What You'll Do.
Ensure platform performance
Make systemic improvements
Perform Root Cause Analysis for outages
and maintain infrastructure on AWS
Develop and manage observability solutions
Monitor system health and performance
Automate infrastructure provisioning and configuration
Respond to and resolve production incidents
Ensure applications are designed for high availability and resilience
Identify and address performance bottlenecks
Drive continuous improvement through automation
Conduct post-incident reviews
How You'll Work.
Team & Collaboration
Work closely with development teams; Collaborate with engineering teams; Collaboration skills
Communication Scope
Spectacular communicator
Full Job Description
With 1,000+ intelligence professionals serving over 1,900 clients worldwide, Recorded Future is the world’s most advanced, and largest, intelligence company! Recorded Future is seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our growing team. In this role, you will be instrumental in ensuring the reliability, scalability, and performance of our critical systems. You will work closely with development teams to build and maintain robust infrastructure, implement automation, and foster a culture of operational excellence. This position requires a strong understanding of cloud environments, observability, and infrastructure as code principles. What You'll Do: Ensure the performance, capacity, scalability, reliability, resiliency, security, compliance, support, cost efficiency, SLA, SLOs, RPOs and RTOs for the platform, either directly or in collaboration with other teams. Make systemic improvements both proactively and for recurring issues. Perform comprehensive Root Cause Analysis for outages. Design, implement, and maintain scalable and reliable infrastructure on AWS. Develop and manage observability solutions using tools such as Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance. Automate infrastructure provisioning and configuration using Terraform and Chef. Participate in a 24/7 on-call rotation to respond to and resolve production incidents. Collaborate with engineering teams to ensure applications are designed for high availability and resilience. Proactively identify and address performance bottlenecks and potential issues. Drive continuous improvement through automation, process optimization, and post-incident reviews. What You'll Bring: 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role. Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS. Expert-level troubleshoo
Applying for this Site Reliability Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Recorded Future?
Real rants from real employees. Read before you apply.