MatX

RuntimeEngineer

Mountain View, California, United States Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Mid candidates.

The Brief

“Runtime Engineer at MatX. Skills: systems programming, Python interop, API/ABI contracts, accelerator programming. Build host-side interface library. Own and extend executable format”

What You'll Achieve.

measurable performance targets

Industry & Context.

Eligibility Requirements

U. S. export controls

What They're Looking For.

Must Have

systems programming language, memory management, allocator design, FFI/ABI work, Python interop layers, API or ABI contracts, accelerator programming model, ML-systems literate

Nice to Have

LLM inference internals, Rust at depth, Custom allocator design, ML framework integration, Profiler or tracing infrastructure, Driver-adjacent or kernel-bypass work, new-silicon bring-up

What You'll Do.

Build host-side interface library

Own and extend executable format

Design custom-kernel ABI

Build Python bindings

Build LLM inference serving stack

Bring up interconnect topology

Design chip profilers

Hit measurable performance targets

How You'll Work.

Team & Collaboration

contracts that bind teams together

Full Job Description

What MatX is Building MatX is building custom silicon for large-language-model inference and training, with HW/SW co-design across ISA, RTL, simulator, compiler, and kernels so each layer benefits from the others. The runtime owns the host-side stack and the contracts that bind those teams together. What You'll Do Here Build the host-side interface library — device memory management, DMA, streams and events, sync primitives — that every compiler-emitted program runs on top of Own and extend the executable format: the compiler→runtime contract, its versioning, the weight and quantization layouts that let compiler and runtime evolve independently Design the custom-kernel ABI — calling convention, sync semantics, lifecycle — and the host-side marshaling layer (DLPack, the buffer protocol, numpy) that gets Python tensors to the device Build Python bindings via PyO3, with a C-ABI shim as the alternative integration path for downstream consumers Build the LLM inference serving stack — paged KV cache, continuous batching, request scheduling, token streaming — and the cluster orchestration primitives underneath it Bring up interconnect topology from the host and own the failure-detection and clean-teardown path for stop-restructure-resume recovery across racks Design what the chip exposes to host-side profilers and debuggers — perf counters, traces, and the Python surfaces ML engineers actually use — and hit measurable performance targets on runtime overhead and serving throughput Who You Are Strong experience in a systems programming language — Rust, C, C++, or Go — including memory management, allocator design, and FFI/ABI work Have built Python interop layers in production (PyO3, ctypes, pybind11, or equivalent C-ABI bridging) Have designed and maintained API or ABI contracts between teams — versioning, evolution, breaking-change discipline — not just consumed someone else's Hands-on with at least one accelerator programming model (CUDA, ROCm, oneAPI Level Zero, TPU, or

Free ATS check

Applying for this Runtime Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 54 detected · ranked by frequency

device memory management ×3

DMA ×3

streams and events ×3

sync primitives ×3

executable format ×3

weight and quantization layouts ×3

custom-kernel ABI ×3

calling convention ×3

sync semantics ×3

lifecycle ×3

host-side marshaling layer ×3

Python tensors ×3

Python bindings ×3

C-ABI shim ×3

LLM inference serving stack ×3

paged KV cache ×3

continuous batching ×3

request scheduling ×3

token streaming ×3

cluster orchestration primitives ×3

interconnect topology ×3

failure-detection ×3

clean-teardown path ×3

stop-restructure-resume recovery ×3

chip profilers ×3

host-side debuggers ×3

perf counters ×3

traces ×3

Python surfaces ×3

runtime overhead ×3

serving throughput ×3

systems programming ×2

Role Details

Experience 2–5 yrs

Level Mid

Work Mode remote

Category software

AI-Extracted Insights

Domain Areas

large-language-model-inferencelarge-language-model-traininghw-sw-co-designml-systemstraining-loopinference-loopcollectivestensor-layout

How to Apply on Greenhouse

Create a Greenhouse profile before applying — it saves time across multiple applications.
Upload your resume as a PDF; the parser handles it better than Word.
Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about MatX?

Real rants from real employees. Read before you apply.

Read Company Rants →