Amazon.com Services LLC
Research Science, Applied Science, subsidiaries
AppliedScientist
Neural analysis suggests this role is
optimal for Mid+ candidates.
“Applied Scientist at Amazon.com Services LLC. Skills: Robotics, Machine learning, Evaluation systems. Design evaluation frameworks. Implement evaluation frameworks”
Industry & Context.
Identify performance gaps; Identify failure modes
What They're Looking For.
Must Have
3+ years building models, PhD or Master's degree, 4+ years CS, CE, ML, Experience programming Java, C++, Python, Experience algorithms and data structures, Experience parsing, Experience numerical optimization, Experience data mining, Experience parallel and distributed computing, Experience high-performance computing
Nice to Have
Experience using Unix/Linux
What You'll Do.
Design evaluation frameworks
Implement evaluation frameworks
Develop task definitions
Develop success criteria
Develop benchmarking methodologies
Create data collection protocols
Refine data collection protocols
Build teleoperation workflows
Build operator interfaces
Analyze evaluation results
Analyze collected data
Identify performance gaps
Identify failure modes
Identify opportunities for data collection
Collaborate with engineering teams
Integrate evaluation tooling
Integrate logging systems
Integrate data pipelines
Stay current with advances
Lead technical projects
Mentor junior scientists
Mentor junior engineers
How You'll Work.
Team & Collaboration
Engineering teams; Robotics stack
Process & Methodology
Technical projects
Full Job Description
We are seeking an Applied Scientist to lead the development of evaluation frameworks and data collection protocols for robotic capabilities. In this role, you will focus on designing how we measure, stress-test, and improve robot behavior across a wide range of real-world tasks. Your work will play a critical role in shaping how policies are validated and how high-quality datasets are generated to accelerate system performance. You will operate at the intersection of robotics, machine learning, and human-in-the-loop systems, building the infrastructure and methodologies that connect teleoperation, evaluation, and learning. This includes developing evaluation policies, defining task structures, and contributing to operator-facing interfaces that enable scalable and reliable data collection. The ideal candidate is highly experimental, systems-oriented, and comfortable working across software, robotics, and data pipelines, with a strong focus on turning ambiguous capability goals into measurable and actionable evaluation systems. Key job responsibilities - Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios - Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies - Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs - Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection - Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection - Collaborate with engineering teams to integrate evaluation tooling, logging systems, and data pipelines into the broader robotics stack - Stay current with advances in robotics, evaluation methodologies, and human-in-the-loop learning to continuou
Applying for this Applied Scientist role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
ANONYMOUS · UNFILTERED
What do employees actually say about Amazon.com Services LLC?
Real rants from real employees. Read before you apply.