XPENG

Technology

SeniorAIDataInfrastructure/PipelineEngineer

$175–296k Mountain View, California, United States FULL TIME
Market Sentiment
HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Senior AI Data Infrastructure/Pipeline Engineer at XPENG. Skills: Data infrastructure, Data pipelines, AI infrastructure, Distributed systems. Design and construction of core data closed loop. Develop toolchains for data cleaning”

Industry & Context.

Technology
Problems you'll solve

Performance optimization; Troubleshooting; Locate performance bottlenecks; Resolve performance bottlenecks

What They're Looking For.

Must Have

Bachelor's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields, 3-5+ years of experience in large-scale data processing or data platform development, Proficiency in at least one programming language among Python / Go / Java, Solid software engineering foundation, Good coding standards, Sense of code quality, Hands-on project experience in at least two of the following areas, Design and development of large-scale data pipelines / ETL systems, Production-level experience with distributed message queues (Kafka / Pulsar / RabbitMQ), Familiar with stream processing paradigms, Experience with distributed data lake systems (e.g., Apache Iceberg), Familiar with Iceberg's table format, partition evolution, snapshot isolation, Practical performance tuning and deployment experience, Experience with columnar storage formats (e.g., Lance), Related query engines, Practical application in large model training, Hands-on experience using and optimizing relational databases (MySQL / PostgreSQL), Hands-on experience using and optimizing NoSQL databases (Redis / MongoDB), Understand metadata management and caching strategies, Experience in performance optimization and troubleshooting for large-scale distributed systems, Able to quickly locate and resolve complex performance bottlenecks, Experience with Kubernetes / Docker containerization deployment, Cross-team communication and collaboration skills, High sense of responsibility, Proactive problem-solving attitude

Nice to Have

Familiarity with closed-loop data in the embodied AI industry, Some understanding of the autonomous driving industry, Awareness of data closed loop and data flywheel concepts, Enthusiasm for this field, Experience with AI infrastructure or model training workflows, Familiarity with data lake / data warehouse systems, Practical experience implementing data version control and data lineage tracing, Open-source contributions on GitHub or a technical blog, Continuous attention to the latest technological trends in big data / AI infrastructure

What You'll Do.

Design and construction of core data closed loop

Develop toolchains for data cleaning

Develop toolchains for annotation quality inspection

Develop toolchains for data mining

Support algorithm team in locating model error cases

Drive iterative model optimization

Data Support for Production and R&D Processes

Connected vehicle data collection

Internal data collection

External data collection

Data cleaning and standardization

Offline data processing

Real-time data processing

Support autonomous driving operations

Support smart cockpits operations

Support overseas data collection

Support robotics data collection

Optimize performance of the entire data pipeline

Solve bottlenecks in large-scale data transmission

Solve bottlenecks in memory management

Solve bottlenecks in I/O

Build a distributed data processing system

Build a data management platform

Data collection to data lake ingestion

Data lake ingestion to model training

Implement data version control capabilities

Implement data lineage tracing capabilities

Implement metadata management capabilities

Implement fast data retrieval capabilities

Support unified data access

Support collaboration across multiple teams

Collaborate with the large model team

Collaborate with other technical teams

Understand business requirements

Respond quickly to requirements

Ensure successful implementation

How You'll Work.

Team & Collaboration

Cross-team communication; Collaboration across multiple teams

Full Job Description

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. As a core member of our AI Infrastructure team, you will be responsible for building the end-to-end data pipeline for autonomous driving, covering the entire chain from onboard data upload → cloud-based preprocessing → dataset production → model training / simulation input. In autonomous driving systems, the stability and efficiency of the pipeline directly determine the speed of algorithm iteration. We look forward to building a reliable, observable, and cost-effective data pipeline that supports the daily flow of petabyte-scale sensor data. Key Responsibilities Responsible for the design and construction of core data closed loop pipelines. Develop toolchains for data cleaning, annotation quality inspection, and data mining to support the algorithm team in quickly locating model error cases and driving iterative model optimization. Data Support for Production and R&D Processes. This includes log event tracking, connected vehicle data, internal and external data collection, data synchronization, data cleaning and standardization, data modeling, offline and real-time data processing, data as a service, and data visualization. Support business operations such as autonomous driving, smart cockpits, overseas data collection, and robotics data collection. Responsible for optimizing the performance of the entire data pipeline (collection, cleaning, conversion). Solve bottlenecks in large-scale data transmission, memory management, I/O, etc., and build a distributed data processing system with high throughput and low latency. Responsible fo

Free ATS check

Applying for this Senior AI Data Infrastructure/Pipeline Engineer role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about XPENG?

Real rants from real employees. Read before you apply.

Read Company Rants →