FieldAI
Robotics
StaffMLSystemsEngineer,DistributedSystems
Neural analysis suggests this role is
optimal for Senior candidates.
“Staff ML Systems Engineer, Distributed Systems at FieldAI. Skills: Distributed systems, Machine learning pipelines, System architecture, Python. Design and build scalable distributed ML pipelines. Architect distributed execution systems”
What You'll Achieve.
Deliver results today; Get better every time our robots run; Ship systems that perform in the real world; Define dependable, field-ready autonomy
Industry & Context.
Solve the hardest problems in robotics; Diagnose and resolve bottlenecks in distributed environments; Tackle tough, uncharted questions
What They're Looking For.
Must Have
5+ years of experience building distributed systems, backend infrastructure, machine learning platforms, or large-scale data processing systems, Python programming skills, experience with concurrency, performance optimization, systems development, Experience with distributed computing frameworks such as Ray, Spark, Dask, Flink, or similar technologies, Experience designing and scaling data pipelines or machine learning workflows, system design skills with demonstrated expertise in scalability, reliability, and performance optimization, Experience diagnosing and resolving bottlenecks in distributed environments, Ability to work cross-functionally and drive technical decisions across multiple teams
Nice to Have
Experience building infrastructure for machine learning training and inference systems, Familiarity with modern ML frameworks such as PyTorch or TensorFlow, Experience with multi-node or multi-GPU training architectures, including DDP, FSDP, DeepSpeed, or similar technologies, Experience operating Kubernetes-based infrastructure and large-scale cloud systems, Deep understanding of distributed systems concepts including data locality, serialization costs, scheduling, and resource management, Experience with distributed debugging, observability, and workflow orchestration platforms, Proven ability to establish technical direction and influence architecture across organizations
What You'll Do.
Design and build scalable distributed ML pipelines
Architect distributed execution systems
Develop reusable abstractions
Optimize performance across distributed CPU and GPU environments
Design systems for data partitioning and memory utilization
Productionize research workflows
Enable large-scale model development
Establish best practices and engineering standards
Evaluate distributed computing frameworks
Improve observability
and operational tooling
How You'll Work.
Team & Collaboration
Partner closely with ML engineers, data engineers, and infrastructure teams; Work cross-functionally; Drive technical decisions across multiple teams; Collaborate with a world-class team; Work across disciplines
Full Job Description
## Description FieldAI’s Irvine team is where embodied AI meets real robots, real sensors, and real field deployments. Based in the heart of Southern California’s robotics ecosystem, we build risk-aware, reliable, field-ready AI systems that solve the hardest problems in robotics and unlock the full potential of embodied intelligence. If you want your work to ship, get tested on hardware, and improve through real deployments, Irvine is the place. We go beyond typical data-driven approaches or pure transformer-only architectures, combining rigorous engineering with learning systems proven in globally deployed solutions that deliver results today and get better every time our robots run in the field. ## What You'll Get To Do Design and build scalable distributed machine learning pipelines across data processing, model training, evaluation, and post-processing workflows. Architect distributed execution systems, including parallelization strategies, workload scheduling, resource allocation, and fault tolerance mechanisms. Develop reusable abstractions, frameworks, and libraries that simplify distributed pipeline development. Optimize performance across distributed CPU and GPU environments, improving throughput, utilization, and reliability. Design systems that effectively manage data partitioning, memory utilization, serialization overhead, and compute efficiency. Partner closely with ML engineers, data engineers, and infrastructure teams to productionize research workflows and enable large-scale model development. Establish best practices and engineering standards for distributed machine learning infrastructure. Evaluate and guide decisions around distributed computing frameworks, infrastructure technologies, and system design trade-offs. Improve observability, debugging, monitoring, and operational tooling for distributed systems at scale. ## What You Have 5+ years of experience building distributed systems, backend infrastructure, machine learning platforms, or large
Applying for this Staff ML Systems Engineer, Distributed Systems role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Lever
- Lever uses a streamlined one-page form — apply in under 5 minutes.
- LinkedIn import works well; review parsed data before submitting.
- The cover letter field is optional but visible to reviewers — use it to differentiate.
- Referral codes from employees can significantly boost visibility of your application.
ANONYMOUS · UNFILTERED
What do employees actually say about FieldAI?
Real rants from real employees. Read before you apply.