Company

Sr.AIInferenceSystemsEngineer

$120–120k Palo Alto, California, United States FULL TIME

The Brief

“Sr. AI Inference Systems Engineer. Skills: AI Inference Optimization, Heterogeneous Computing, Large Model Inference, KV Cache, Router Architecture, Hardware Accelerator Tuning, Distributed Systems, CUDA, Triton. Lead the optimization of the full inference pipeline for Large Models (LLM, Multimodal). Conduct in-depth research into the underlying inference logic of various hardware accelerators”

What You'll Achieve.

Maximize throughput; Minimize latency

Industry & Context.

Problems you'll solve

Resolve long-tail issues such as communication latency and load imbalance in distributed inference; Overcome key technical bottlenecks in inference design

What They're Looking For.

Must Have

Master’s or Ph.D. in Computer Science, Electronic Engineering, AI, or related significant professional experience in AI inference optimization or heterogeneous computing, Proficient in at least one AI accelerator architecture, Mastery of core inference optimization techniques, including multi-level KV Cache management, Quantization, and Intelligent Routing, Expert in parallel computing and distributed systems, Deep understanding of low-level programming models (e.g., CUDA, Triton) and inference engine architectures, Familiar with mainstream deep learning frameworks (e.g., PyTorch, TensorFlow)

Nice to Have

Experience in optimizing ultra-large-scale models, Experience in tuning ultra-large-scale inference clusters, Driving AI inference high-level publications or core patents in relevant fields

What You'll Do.

Lead the optimization of the full inference pipeline for Large Models (LLM

Conduct in-depth research into the underlying inference logic of various hardware accelerators

Design and implement high-performance inference optimize scheduling and memory management

Track global advancements in inference technology

Drive the productization of emerging technologies within production environments

Lead efforts to overcome key technical bottlenecks in inference design

Develop standardized optimization schemes

How You'll Work.

Team & Collaboration

Cross-team collaboration skills; Collaborative operator optimization

Process & Methodology

Proven track record of leading complex inference projects to fruition

Free ATS check

Applying for this Sr. AI Inference Systems Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

Skill Signal 42 detected

Core

Distributed Systems ×6

CUDA ×4

Triton ×4

Parallel computing ×4

Required

AI Inference Optimization ×3

Heterogeneous Computing ×3

End-to-End Inference Optimization ×3

Heterogeneous Computing Research ×3

Inference Framework & Toolchain Design ×3

Technological Innovation ×3

Technical Leadership ×3

AI accelerator architecture tuning ×3

Inference optimization techniques ×3

Low-level programming models ×3

Nice to have

KV Cache storage strategies

Router architecture design

Operator optimization

Hardware accelerators

Inference scheduling

Memory management

Compiler optimization

Model compression

Hardware fusion

AI accelerator architecture

Behavioural

Cross-team collaboration

Role Details

Seniority

mid

Type

FULL TIME

Experience

5–10 yrs

Education

Master's or Ph.D.

Salary Band

100k-150k

AI-Extracted Insights

Domain Areas

ai-inference-optimization

heterogeneous-computing

large-models-llm

multimodal

ai-accelerator-architecture

inference-optimization-techniques

parallel-computing

distributed-systems

How to Apply on Workday

Workday has a multi-step form — save your progress after every section.
"Apply With LinkedIn" can fail or lose data; manual entry is more reliable.
Watch for the "Submit for Review" final step — hitting "Save" alone does not submit.
Job requisition numbers are useful when following up with HR by email.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →