HappyRobot
AI
InfrastructureEngineer
Neural analysis suggests this role is
optimal for Mid candidates.
“Infrastructure Engineer at HappyRobot. Skills: Scaling operational resilience, Observability, Debugging production systems, Go, Kubernetes. Scaling operational resilience. Owning stability, observability, and debugging workflows”
What You'll Achieve.
Scaling our operational resilience; Reducing incident load; Improving developer focus and system uptime
Industry & Context.
Problem-solving skills; Ability to dive into unfamiliar backend codebases; Getting to the root of hard problems
On-call (implied by 'reliability workflows' and 'live incidents')
What They're Looking For.
Must Have
3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.), problem-solving skills and ability to dive into unfamiliar backend codebases, Go, Kubernetes
Nice to Have
Experience working with distributed systems or services at scale, Built or maintained internal tooling for on-call teams or reliability workflows, Familiarity with deployment pipelines, CI/CD, or infra-as-code, Experience improving system observability (e.g., custom metrics, traces, log pipelines)
What You'll Do.
Scaling operational resilience
and debugging workflows
Keeping systems running smoothly
Untangling complex failures in real time
Designing tools that turn chaos into clarity
Shifting from reactive to proactive operations
Reducing incident load
Building internal tooling
Improving developer focus and system uptime
Getting to the root of hard problems
Making systems (and teams) stronger
How You'll Work.
Team & Collaboration
Making systems (and teams) stronger; Be friendly & have fun with your coworkers; Give feedback with kindness; Challenge each other with respect; Celebrate wins together without ego; Stay aligned
Communication Scope
Clear, calm communication under pressure — especially during live incidents
Process & Methodology
Ownership & Autonomy, Take full ownership of projects and ship fast
Full Job Description
About HappyRobot HappyRobot is the infrastructure for enterprises to build and orchestrate AI workforces. Our AI workers don't just communicate - they make decisions, take action, and run operations autonomously across voice, email, and enterprise systems. Born in Y Combinator (S23) and backed by a16z and Base10 with over $60M raised, we power critical operations for global enterprises worldwide. Our platform is battle-tested in the most demanding environments - where AI has real consequences. We started in logistics, built our own voice stack, models, and orchestration layer from the ground up, and are now bringing that infrastructure to every enterprise that runs the real economy. Learn more about our vision in our manifesto. https://www.happyrobot.ai/blog/manifesto About the Role We're looking for a Infrastructure Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations. This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment. Must-Have - 3+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.) - Strong problem-solving skills and ability to dive into unfamiliar backend codebases - Strong Go and Kubernetes experience. - Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry) - Clear, calm communication under pressure — especially during live incidents Nice-to-Have - Experience working with distributed systems or services at scale - Built or maintained i
Applying for this Infrastructure Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Ashby
- Ashby is a fast modern ATS — most applications take under 3 minutes.
- The resume parser is strong; verify parsed experience dates and job titles.
- Custom screening questions are often scored algorithmically — answer completely.
- Location field affects geo-based screening; use your actual metro area.
ANONYMOUS · UNFILTERED
What do employees actually say about HappyRobot?
Real rants from real employees. Read before you apply.