MatX

AI

RuntimeEngineer

Mountain View, California, United States Remote Friendly
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Runtime Engineer at MatX. Skills: systems programming, Python interop, API/ABI contracts, accelerator programming. Build host-side interface library. Own and extend executable format”

What You'll Achieve.

measurable performance targets

Industry & Context.

AI
Eligibility Requirements

U. S. export controls

What They're Looking For.

Must Have

systems programming language, memory management, allocator design, FFI/ABI work, Python interop layers, API or ABI contracts, accelerator programming model, ML-systems literate

Nice to Have

LLM inference internals, Rust at depth, Custom allocator design, ML framework integration, Profiler or tracing infrastructure, Driver-adjacent or kernel-bypass work, new-silicon bring-up

What You'll Do.

Build host-side interface library

Own and extend executable format

Design custom-kernel ABI

Build Python bindings

Build LLM inference serving stack

Bring up interconnect topology

Design chip profilers

Hit measurable performance targets

How You'll Work.

Team & Collaboration

contracts that bind teams together

Full Job Description

What MatX is Building MatX is building custom silicon for large-language-model inference and training, with HW/SW co-design across ISA, RTL, simulator, compiler, and kernels so each layer benefits from the others. The runtime owns the host-side stack and the contracts that bind those teams together. What You'll Do Here Build the host-side interface library — device memory management, DMA, streams and events, sync primitives — that every compiler-emitted program runs on top of Own and extend the executable format: the compiler→runtime contract, its versioning, the weight and quantization layouts that let compiler and runtime evolve independently Design the custom-kernel ABI — calling convention, sync semantics, lifecycle — and the host-side marshaling layer (DLPack, the buffer protocol, numpy) that gets Python tensors to the device Build Python bindings via PyO3, with a C-ABI shim as the alternative integration path for downstream consumers Build the LLM inference serving stack — paged KV cache, continuous batching, request scheduling, token streaming — and the cluster orchestration primitives underneath it Bring up interconnect topology from the host and own the failure-detection and clean-teardown path for stop-restructure-resume recovery across racks Design what the chip exposes to host-side profilers and debuggers — perf counters, traces, and the Python surfaces ML engineers actually use — and hit measurable performance targets on runtime overhead and serving throughput Who You Are Strong experience in a systems programming language — Rust, C, C++, or Go — including memory management, allocator design, and FFI/ABI work Have built Python interop layers in production (PyO3, ctypes, pybind11, or equivalent C-ABI bridging) Have designed and maintained API or ABI contracts between teams — versioning, evolution, breaking-change discipline — not just consumed someone else's Hands-on with at least one accelerator programming model (CUDA, ROCm, oneAPI Level Zero, TPU, or

Free ATS check

Applying for this Runtime Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about MatX?

Real rants from real employees. Read before you apply.

Read Company Rants →