Redpanda Data

Technology

StaffProductionOperationsEngineer

$211–256k Bulgaria FULL TIME Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Staff Production Operations Engineer at Redpanda Data. Skills: Reliability engineering, Incident management, Automation, AI agents. Drive process improvements across incident lifecycle. Coordinate on-call program”

What You'll Achieve.

Respond faster from outages; Learn more from outages; Systematically improve reliability

Industry & Context.

Technology
Problems you'll solve

Root cause analysis; Troubleshooting

Eligibility Requirements

On-call rotation

What They're Looking For.

Must Have

5+ years SRE/DevOps/production operations, Lead initiatives end-to-end, Incident management tooling experience, Observability stacks experience, Reliability concepts fluency, Automation and tooling for toil reduction, Proficiency in Go, AI-assisted software development workflows, AWS/Azure/GCP knowledge, Infrastructure as code experience, Drive alignment without authority

Nice to Have

Hands-on building agents/automations with LLMs, Familiarity with Redpanda/Kafka, Experience in B2B infrastructure/developer tools

What You'll Do.

Drive process improvements across incident lifecycle

Coordinate on-call program

Select incidents for post-incident review

Facilitate blameless post-incident reviews

Document post-incident findings

Track incident follow-up completion

Address incident follow-ups

Build AI agents to automate toil

Automate incident summarization

Automate post-incident reviews prep

Automate follow-up tracking

Automate on-call analytics

Maintain incident process documentation

How You'll Work.

Team & Collaboration

Globally distributed engineering team; Broader Engineering team; Engineering leadership; Product; Customer Success

Process & Methodology

Initiative planning, Execution

Full Job Description

Redpanda is pioneering the Agentic Data Plane (ADP) - a new category in AI infrastructure that makes it simple and secure to connect AI agents with enterprise data and systems. Built on a multi-modal data streaming engine, Redpanda empowers agentic applications that reason and act in real-time with speed, autonomy, and precision. Global leaders including Activision Blizzard, Cisco, Moody's, Texas Instruments, Vodafone and 2 of the top 5 banks in the U. S. rely on Redpanda to process hundreds of terabytes of data a day. Backed by premier venture investors Lightspeed, GV and Haystack VC, Redpanda is a diverse, people-first organization with teams distributed around the globe. About the Role: We're looking for a Staff Production Operations Engineer to drive Redpanda's reliability operations program. This role combines hands-on site reliability engineering with planning and coordination skills to ensure a world-class operations practice across a globally distributed engineering team. In this role, you'll work with the broader Engineering team, Engineering leadership, Product and Customer Success to drive operational excellence. You'll coordinate our on-call and incident lead rotations, drive blameless post-incident reviews, and own the processes that help us respond faster, learn more from outages, and systematically improve reliability. We're looking for someone who can leverage AI agents to automate the operational toil that slows teams down, building on Redpanda's own ADP platform to do it. You Will: Drive process improvements across the incident lifecycle: severity models, triage enforcement, alert noise reduction, and follow-up completion rates Coordinate the on-call program across multiple geographies: manage schedules and shadow rotations, onboard new engineers, and ensure consistent coverage Select incidents for post-incident review, facilitate blameless post-incident reviews, document findings, and track follow-up completion. Contribute to addressing incident f

Free ATS check

Applying for this Staff Production Operations Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about Redpanda Data?

Real rants from real employees. Read before you apply.

Read Company Rants →