Redpanda Data
Technology
StaffProductionOperationsEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Staff Production Operations Engineer at Redpanda Data. Skills: Reliability engineering, Incident management, Automation, AI agents. Drive process improvements across incident lifecycle. Coordinate on-call program”
What You'll Achieve.
Respond faster from outages; Learn more from outages; Systematically improve reliability
Industry & Context.
Root cause analysis; Troubleshooting
On-call rotation
What They're Looking For.
Must Have
5+ years SRE/DevOps/production operations, Lead initiatives end-to-end, Incident management tooling experience, Observability stacks experience, Reliability concepts fluency, Automation and tooling for toil reduction, Proficiency in Go, AI-assisted software development workflows, AWS/Azure/GCP knowledge, Infrastructure as code experience, Drive alignment without authority
Nice to Have
Hands-on building agents/automations with LLMs, Familiarity with Redpanda/Kafka, Experience in B2B infrastructure/developer tools
What You'll Do.
Drive process improvements across incident lifecycle
Coordinate on-call program
Select incidents for post-incident review
Facilitate blameless post-incident reviews
Document post-incident findings
Track incident follow-up completion
Address incident follow-ups
Build AI agents to automate toil
Automate incident summarization
Automate post-incident reviews prep
Automate follow-up tracking
Automate on-call analytics
Maintain incident process documentation
How You'll Work.
Team & Collaboration
Globally distributed engineering team; Broader Engineering team; Engineering leadership; Product; Customer Success
Process & Methodology
Initiative planning, Execution
Full Job Description
Redpanda is pioneering the Agentic Data Plane (ADP) - a new category in AI infrastructure that makes it simple and secure to connect AI agents with enterprise data and systems. Built on a multi-modal data streaming engine, Redpanda empowers agentic applications that reason and act in real-time with speed, autonomy, and precision. Global leaders including Activision Blizzard, Cisco, Moody's, Texas Instruments, Vodafone and 2 of the top 5 banks in the U. S. rely on Redpanda to process hundreds of terabytes of data a day. Backed by premier venture investors Lightspeed, GV and Haystack VC, Redpanda is a diverse, people-first organization with teams distributed around the globe. About the Role: We're looking for a Staff Production Operations Engineer to drive Redpanda's reliability operations program. This role combines hands-on site reliability engineering with planning and coordination skills to ensure a world-class operations practice across a globally distributed engineering team. In this role, you'll work with the broader Engineering team, Engineering leadership, Product and Customer Success to drive operational excellence. You'll coordinate our on-call and incident lead rotations, drive blameless post-incident reviews, and own the processes that help us respond faster, learn more from outages, and systematically improve reliability. We're looking for someone who can leverage AI agents to automate the operational toil that slows teams down, building on Redpanda's own ADP platform to do it. You Will: Drive process improvements across the incident lifecycle: severity models, triage enforcement, alert noise reduction, and follow-up completion rates Coordinate the on-call program across multiple geographies: manage schedules and shadow rotations, onboard new engineers, and ensure consistent coverage Select incidents for post-incident review, facilitate blameless post-incident reviews, document findings, and track follow-up completion. Contribute to addressing incident f
Applying for this Staff Production Operations Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about Redpanda Data?
Real rants from real employees. Read before you apply.