XPENG
Technology
SeniorAIDataInfrastructure/PipelineEngineer
Neural analysis suggests this role is
optimal for Senior candidates.
“Senior AI Data Infrastructure/Pipeline Engineer at XPENG. Skills: Data infrastructure, Data pipelines, AI infrastructure, Distributed systems. Design and construction of core data closed loop. Develop toolchains for data cleaning”
Industry & Context.
Performance optimization; Troubleshooting; Locate performance bottlenecks; Resolve performance bottlenecks
What They're Looking For.
Must Have
Bachelor's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields, 3-5+ years of experience in large-scale data processing or data platform development, Proficiency in at least one programming language among Python / Go / Java, Solid software engineering foundation, Good coding standards, Sense of code quality, Hands-on project experience in at least two of the following areas, Design and development of large-scale data pipelines / ETL systems, Production-level experience with distributed message queues (Kafka / Pulsar / RabbitMQ), Familiar with stream processing paradigms, Experience with distributed data lake systems (e.g., Apache Iceberg), Familiar with Iceberg's table format, partition evolution, snapshot isolation, Practical performance tuning and deployment experience, Experience with columnar storage formats (e.g., Lance), Related query engines, Practical application in large model training, Hands-on experience using and optimizing relational databases (MySQL / PostgreSQL), Hands-on experience using and optimizing NoSQL databases (Redis / MongoDB), Understand metadata management and caching strategies, Experience in performance optimization and troubleshooting for large-scale distributed systems, Able to quickly locate and resolve complex performance bottlenecks, Experience with Kubernetes / Docker containerization deployment, Cross-team communication and collaboration skills, High sense of responsibility, Proactive problem-solving attitude
Nice to Have
Familiarity with closed-loop data in the embodied AI industry, Some understanding of the autonomous driving industry, Awareness of data closed loop and data flywheel concepts, Enthusiasm for this field, Experience with AI infrastructure or model training workflows, Familiarity with data lake / data warehouse systems, Practical experience implementing data version control and data lineage tracing, Open-source contributions on GitHub or a technical blog, Continuous attention to the latest technological trends in big data / AI infrastructure
What You'll Do.
Design and construction of core data closed loop
Develop toolchains for data cleaning
Develop toolchains for annotation quality inspection
Develop toolchains for data mining
Support algorithm team in locating model error cases
Drive iterative model optimization
Data Support for Production and R&D Processes
Connected vehicle data collection
Internal data collection
External data collection
Data cleaning and standardization
Offline data processing
Real-time data processing
Support autonomous driving operations
Support smart cockpits operations
Support overseas data collection
Support robotics data collection
Optimize performance of the entire data pipeline
Solve bottlenecks in large-scale data transmission
Solve bottlenecks in memory management
Solve bottlenecks in I/O
Build a distributed data processing system
Build a data management platform
Data collection to data lake ingestion
Data lake ingestion to model training
Implement data version control capabilities
Implement data lineage tracing capabilities
Implement metadata management capabilities
Implement fast data retrieval capabilities
Support unified data access
Support collaboration across multiple teams
Collaborate with the large model team
Collaborate with other technical teams
Understand business requirements
Respond quickly to requirements
Ensure successful implementation
How You'll Work.
Team & Collaboration
Cross-team communication; Collaboration across multiple teams
Full Job Description
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. As a core member of our AI Infrastructure team, you will be responsible for building the end-to-end data pipeline for autonomous driving, covering the entire chain from onboard data upload → cloud-based preprocessing → dataset production → model training / simulation input. In autonomous driving systems, the stability and efficiency of the pipeline directly determine the speed of algorithm iteration. We look forward to building a reliable, observable, and cost-effective data pipeline that supports the daily flow of petabyte-scale sensor data. Key Responsibilities Responsible for the design and construction of core data closed loop pipelines. Develop toolchains for data cleaning, annotation quality inspection, and data mining to support the algorithm team in quickly locating model error cases and driving iterative model optimization. Data Support for Production and R&D Processes. This includes log event tracking, connected vehicle data, internal and external data collection, data synchronization, data cleaning and standardization, data modeling, offline and real-time data processing, data as a service, and data visualization. Support business operations such as autonomous driving, smart cockpits, overseas data collection, and robotics data collection. Responsible for optimizing the performance of the entire data pipeline (collection, cleaning, conversion). Solve bottlenecks in large-scale data transmission, memory management, I/O, etc., and build a distributed data processing system with high throughput and low latency. Responsible fo
Applying for this Senior AI Data Infrastructure/Pipeline Engineer role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about XPENG?
Real rants from real employees. Read before you apply.