FuriosaAI

Software

SoftwareEngineer,SiteReliabilityEngineer

Seoul, South Korea FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid+ candidates.

The Brief

“Software Engineer, Site Reliability Engineer at FuriosaAI. Skills: Site Reliability Engineering, Kubernetes, Observability, Automation. Apply software engineering to improve reliability. Improve production systems so failures are isolated”

What You'll Achieve.

improve the reliability; improve the scalability; improve the security; improve the operability; failures are isolated; failures degraded gracefully; failures detected quickly; failures recovered safely; understand user-facing reliability; Reduce operational toil

Industry & Context.

Software
Problems you'll solve

reason about production systems end-to-end; identify reliability risks; build the observability foundation; drive improvements through code; drive improvements through configuration; drive improvements through automation; drive improvements through architectural changes; Analyze production systems end-to-end; identify reliability risks; drive architectural improvements; diagnose problems

What They're Looking For.

Must Have

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience, Rust, Python, Go, operating systems, computer networks, cloud-native or container-based environments, analyze technical problems, communicate clearly with engineering teams

Nice to Have

Experience improving reliability of production systems using SLOs, observability, incident analysis, rollout safety, and error-budget-driven decision making, Experience designing or operating distributed systems where failures, overload, latency, and capacity limits must be explicitly managed, Experience building automation, internal tooling, or self-service workflows that reduce operational toil and improve engineering productivity, Experience working across software, infrastructure, networking, and security boundaries to diagnose problems and drive architectural improvements

What You'll Do.

Apply software engineering to improve reliability

Improve production systems so failures are isolated

Build observability foundations

Reduce operational toil through automation

Define and evolve reliability goals

Design and build observability foundations

Analyze production systems end-to-end

Improve change safety and failure recovery

Reduce operational toil by building automation

How You'll Work.

Team & Collaboration

work across baremetal Kubernetes clusters; work across cloud control planes; work across networking; work across observability systems; work across deployment pipelines; work across API services; communicate clearly with engineering teams; working across software, infrastructure, networking, and security boundaries

Communication Scope

communicate clearly with engineering teams

Full Job Description

ABOUT THE ROLE As a Site Reliability Engineer, you will apply software engineering to improve the reliability, scalability, security, and operability of FuriosaAI’s production infrastructure and customer-facing services. You will work across baremetal Kubernetes clusters, cloud control planes, networking, observability systems, deployment pipelines, and API services running on Furiosa NPUs. We are looking for an engineer who can reason about production systems end-to-end, identify reliability risks across service and infrastructure boundaries, build the observability foundation required to understand them, and drive improvements through code, configuration, automation, and architectural changes. In this role, your mission is defined by three primary pillars: - Reliability Architecture: Improve production systems so failures are isolated, degraded gracefully, detected quickly, and recovered safely. - Observability & SLOs: Build the metrics, logs, traces, dashboards, alerts, and service-level indicators required to understand user-facing reliability. - Production Engineering: Reduce operational toil through automation, self-service workflows, safer rollouts, and hands-on engineering contributions. RESPONSIBILITIES - Define and evolve reliability goals for production systems through SLIs, SLOs, error budgets, and meaningful operational metrics. - Design and build observability foundations that make system behavior, user impact, performance bottlenecks, and failure modes measurable and actionable. - Analyze production systems end-to-end, identify reliability risks across software, infrastructure, and networking boundaries, and drive architectural improvements. - Improve change safety and failure recovery through better rollout strategies, capacity planning, load validation, graceful degradation, and incident learning loops. - Reduce operational toil by building automation, internal tooling, and self-service workflows that make production systems easier to operate and ha

Free ATS check

Applying for this Software Engineer, Site Reliability Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Ashby

  • Ashby is a fast modern ATS — most applications take under 3 minutes.
  • The resume parser is strong; verify parsed experience dates and job titles.
  • Custom screening questions are often scored algorithmically — answer completely.
  • Location field affects geo-based screening; use your actual metro area.

ANONYMOUS · UNFILTERED

What do employees actually say about FuriosaAI?

Real rants from real employees. Read before you apply.

Read Company Rants →