XPENG

smart technology

AIAgentDataPipelineIntern

Mountain View, California, United States Remote Friendly
The Brief

“AI Agent Data Pipeline Intern at XPENG. Skills: data pipelines, LLM-assisted data cleaning workflows, data schemas, metadata, quality checks, retrieval and indexing workflows, semantic search, RAG-style pipelines. Build pipelines to ingest and organize experiment-related data from team communications, meeting notes, experiment plans, analysis documents, metrics, and evaluation results.. Use LLM-based methods to clean noisy unstructured data, extract experiment-relevant information, and convert f”

What You'll Achieve.

improve efficiency, quality, and reliability of the experiment lifecycle; correctly retrieve, interpret, and reason over experiment-related information; improve agent performance on domain-specific tasks; make experiment context easier to search, trace, and use in downstream agent workflows; agent can access relevant experiment context; agent uses curated experiment data correctly to generate summaries, comparisons, recommendations, and analysis insights; help teams monitor experiment status, outcomes, and trends

Industry & Context.

smart technology
Problems you'll solve

analytical thinking

What They're Looking For.

Must Have

skills in Python, skills in SQL, skills in data processing, Experience working with structured and unstructured data, Experience working with text-heavy sources such as documents, notes, messages, or logs, Familiarity with data pipelines, Familiarity with ETL workflows, Familiarity with large-scale data processing

Nice to Have

Interest in LLM development, Interest in LLM evaluation, Interest in agentic AI systems, Interest in RAG pipelines, Interest in semantic retrieval, Interest in prompt engineering, Interest in LLM-assisted data processing, Familiarity with machine learning workflows, Familiarity with model training, Familiarity with evaluation metrics, Familiarity with MLOps concepts, Previous experience building internal tools, Previous experience building automation scripts, Previous experience building data quality checks

What You'll Do.

Build pipelines to ingest and organize experiment-related data from team communications

and evaluation results.

Use LLM-based methods to clean noisy unstructured data

extract experiment-relevant information

and convert fragmented discussions into structured records.

and quality checks that make experiment context easier to search

and use in downstream agent workflows.

Support retrieval and indexing workflows

including semantic search or RAG-style pipelines

so the agent can access relevant experiment context.

Prepare curated datasets for agent evaluation and

LLM fine-tuning or instruction-tuning.

Work with MLEs and platform engineers to understand experiment workflows

and the types of insights most useful for planning and analysis.

Evaluate whether the agent uses curated experiment data correctly to generate summaries

and analysis insights.

Contribute to internal tools

or reports that help teams monitor experiment status

How You'll Work.

Team & Collaboration

Work with MLEs and platform engineers to understand experiment workflows, data gaps, and the types of insights most useful for planning and analysis.; collaborating with ML and platform engineers to clarify requirements

Free ATS check

Applying for this AI Agent Data Pipeline Intern role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

How to Apply on Greenhouse

  • Create a Greenhouse profile before applying — it saves time across multiple applications.
  • Upload your resume as a PDF; the parser handles it better than Word.
  • Answer all knockout questions carefully — wrong answers auto-reject before a human sees you.
  • Enable email notifications to track application status in real time.

ANONYMOUS · UNFILTERED

What do employees actually say about XPENG?

Real rants from real employees. Read before you apply.

Read Company Rants →