XPENG
smart technology
AIAgentDataPipelineIntern
“AI Agent Data Pipeline Intern at XPENG. Skills: data pipelines, LLM-assisted data cleaning workflows, data schemas, metadata, quality checks, retrieval and indexing workflows, semantic search, RAG-style pipelines. Build pipelines to ingest and organize experiment-related data from team communications, meeting notes, experiment plans, analysis documents, metrics, and evaluation results.. Use LLM-based methods to clean noisy unstructured data, extract experiment-relevant information, and convert f”
What You'll Achieve.
improve efficiency, quality, and reliability of the experiment lifecycle; correctly retrieve, interpret, and reason over experiment-related information; improve agent performance on domain-specific tasks; make experiment context easier to search, trace, and use in downstream agent workflows; agent can access relevant experiment context; agent uses curated experiment data correctly to generate summaries, comparisons, recommendations, and analysis insights; help teams monitor experiment status, outcomes, and trends
Industry & Context.
analytical thinking
What They're Looking For.
Must Have
skills in Python, skills in SQL, skills in data processing, Experience working with structured and unstructured data, Experience working with text-heavy sources such as documents, notes, messages, or logs, Familiarity with data pipelines, Familiarity with ETL workflows, Familiarity with large-scale data processing
Nice to Have
Interest in LLM development, Interest in LLM evaluation, Interest in agentic AI systems, Interest in RAG pipelines, Interest in semantic retrieval, Interest in prompt engineering, Interest in LLM-assisted data processing, Familiarity with machine learning workflows, Familiarity with model training, Familiarity with evaluation metrics, Familiarity with MLOps concepts, Previous experience building internal tools, Previous experience building automation scripts, Previous experience building data quality checks
What You'll Do.
Build pipelines to ingest and organize experiment-related data from team communications
and evaluation results.
Use LLM-based methods to clean noisy unstructured data
extract experiment-relevant information
and convert fragmented discussions into structured records.
and quality checks that make experiment context easier to search
and use in downstream agent workflows.
Support retrieval and indexing workflows
including semantic search or RAG-style pipelines
so the agent can access relevant experiment context.
Prepare curated datasets for agent evaluation and
LLM fine-tuning or instruction-tuning.
Work with MLEs and platform engineers to understand experiment workflows
and the types of insights most useful for planning and analysis.
Evaluate whether the agent uses curated experiment data correctly to generate summaries
and analysis insights.
Contribute to internal tools
or reports that help teams monitor experiment status
How You'll Work.
Team & Collaboration
Work with MLEs and platform engineers to understand experiment workflows, data gaps, and the types of insights most useful for planning and analysis.; collaborating with ML and platform engineers to clarify requirements
Applying for this AI Agent Data Pipeline Intern role?
Most applicants get filtered before a human reads their resume. See if yours makes the cut.
How to Apply on Greenhouse
- Create a Greenhouse profile before applying — it saves time across multiple applications.
- Upload your resume as a PDF; the parser handles it better than Word.
- Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
- Enable email notifications to track application status in real time.
ANONYMOUS · UNFILTERED
What do employees actually say about XPENG?
Real rants from real employees. Read before you apply.