Company

AI

Sr.AIInferenceSystemsEngineer

$120–120k Palo Alto, California, United States FULL TIME
The Brief

“Sr. AI Inference Systems Engineer. Skills: AI Inference Optimization, Heterogeneous Computing, Large Model Inference, KV Cache, Router Architecture, Hardware Accelerator Tuning, Distributed Systems, CUDA, Triton. Lead the optimization of the full inference pipeline for Large Models (LLM, Multimodal). Conduct in-depth research into the underlying inference logic of various hardware accelerators”

What You'll Achieve.

Maximize throughput; Minimize latency

Industry & Context.

AI
Problems you'll solve

Resolve long-tail issues such as communication latency and load imbalance in distributed inference; Overcome key technical bottlenecks in inference design

What They're Looking For.

Must Have

Master’s or Ph.D. in Computer Science, Electronic Engineering, AI, or related significant professional experience in AI inference optimization or heterogeneous computing, Proficient in at least one AI accelerator architecture, Mastery of core inference optimization techniques, including multi-level KV Cache management, Quantization, and Intelligent Routing, Expert in parallel computing and distributed systems, Deep understanding of low-level programming models (e.g., CUDA, Triton) and inference engine architectures, Familiar with mainstream deep learning frameworks (e.g., PyTorch, TensorFlow)

Nice to Have

Experience in optimizing ultra-large-scale models, Experience in tuning ultra-large-scale inference clusters, Driving AI inference high-level publications or core patents in relevant fields

What You'll Do.

Lead the optimization of the full inference pipeline for Large Models (LLM

Conduct in-depth research into the underlying inference logic of various hardware accelerators

Design and implement high-performance inference optimize scheduling and memory management

Track global advancements in inference technology

Drive the productization of emerging technologies within production environments

Lead efforts to overcome key technical bottlenecks in inference design

Develop standardized optimization schemes

How You'll Work.

Team & Collaboration

Cross-team collaboration skills; Collaborative operator optimization

Process & Methodology

Proven track record of leading complex inference projects to fruition

Free ATS check

Applying for this Sr. AI Inference Systems Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Workday

  • Workday has a multi-step form — save your progress after every section.
  • "Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
  • Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
  • Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →