Company

Technology

LeadDataEngineerwithAIexperience

₹25–45L ~AI est. India FULL TIME Remote Friendly

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Lead candidates.

The Brief

“Lead Data Engineer with AI experience. Skills: Data Engineering, AI Infrastructure, LLM Systems, Agentic Systems. Build batch data pipelines. Optimize batch data pipelines”

Industry & Context.

Technology

Problems you'll solve

Problem-solving mindset; Design scalable systems

What They're Looking For.

Must Have

7+ years of experience in data engineering, 2+ years of experience building production AI/ML or LLM-related data infrastructure, Expertise in Python, SQL, PySpark, Snowflake, Delta Lake, Kafka, and Spark Structured Streaming, Hands-on experience with vector databases, embedding pipelines, and retrieval systems in production RAG environments, Solid understanding of MLOps practices, Knowledge of data governance, security, compliance, and data quality frameworks, Experience working with cloud ecosystems such as AWS or Azure, Experience with containerized environments (Docker, Kubernetes)

Nice to Have

Familiarity with AI/LLM tooling such as LangChain, LlamaIndex, OpenAI/Claudeedrock APIs, and FastAPI

What You'll Do.

Build batch data pipelines

Optimize batch data pipelines

Maintain batch data pipelines

Build streaming data pipelines

Optimize streaming data pipelines

Maintain streaming data pipelines

Design retrieval systems

Implement retrieval systems

Develop entity mappings

Develop knowledge graphs

Maintain semantic contracts

Maintain metadata systems

Maintain lineage tracking

Support ML lifecycle workflows

Support LLM lifecycle workflows

Build APIs for agents

Build context stores for agents

Build tool interfaces for agents

Implement data governance frameworks

Implement PII handling

Implement schema validation

Implement data quality monitoring

Implement compliance-ready audit logging

How You'll Work.

Team & Collaboration

Global engineering teams; Enterprise-scale AI transformation projects

Process & Methodology

Agile

Full Job Description

## Accountabilities Data Pipeline Engineering: Build, optimize, and maintain robust batch and streaming data pipelines using modern cloud-native tools such as Snowflake, PySpark, Delta Lake, and Kafka, ensuring reliability, scalability, and performance. RAG & Retrieval Infrastructure: Design and implement end-to-end retrieval systems including embedding pipelines, vector databases, hybrid search, chunking strategies, and ranking mechanisms to optimize AI context relevance. Semantic & Knowledge Layer Development: Develop ontologies, entity mappings, and knowledge graphs while maintaining semantic contracts, metadata systems, and lineage tracking for AI and ML use cases. ML/LLMOps Enablement: Support ML and LLM lifecycle workflows including dataset curation, feature engineering, model evaluation, experiment tracking, and production monitoring. Agentic Data Systems: Build APIs, context stores, and tool interfaces that enable autonomous agents, including observability for reasoning traces, tool calls, and contextual outputs. Governance & Data Quality: Implement robust data governance frameworks including RBAC, PII handling, schema validation, data quality monitoring, and compliance-ready audit logging systems. Requirements This role requires a highly experienced data engineering professional with strong cloud, distributed systems, and AI infrastructure expertise. The ideal candidate combines deep technical execution with architectural thinking and hands-on experience building production-grade AI-enabled data systems. 7+ years of experience in data engineering with strong exposure to cloud-based data platforms. 2+ years of experience building production AI/ML or LLM-related data infrastructure at scale. Strong expertise in Python, SQL, PySpark, Snowflake, Delta Lake, Kafka, and Spark Structured Streaming. Hands-on experience with vector databases, embedding pipelines, and retrieval systems in production RAG environments. Solid understanding of MLOps practices including M

Free ATS check

Applying for this Lead Data Engineer with AI experience role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 57 detected · ranked by frequency

Agentic Systems ×5

MLflow ×4

Data pipeline engineering ×3

Retrieval systems ×3

Embedding pipelines ×3

Vector databases ×3

Hybrid search ×3

Chunking strategies ×3

Ranking mechanisms ×3

Ontologies ×3

Entity mappings ×3

Knowledge graphs ×3

Semantic contracts ×3

Metadata systems ×3

Lineage tracking ×3

ML lifecycle workflows ×3

Dataset curation ×3

Feature engineering ×3

Model evaluation ×3

Experiment tracking ×3

Production monitoring ×3

Context stores ×3

Tool interfaces ×3

Observability ×3

Reasoning traces ×3

Tool calls ×3

Data governance ×3

RBAC ×3

PII handling ×3

Schema validation ×3

Data quality monitoring ×3

Audit logging ×3

BEHAVIOURAL

Leadership

Role Details

Seniority Lead

Experience 7–15 yrs

Level Lead

Work Mode Flexible

Type FULL TIME

Category software

Salary Band 200k+

AI-Extracted Insights

Domain Areas

cloud-native-toolsdistributed-systemsai-infrastructureproduction-ai-ml-data-infrastructureproduction-llm-data-infrastructurerag-environmentsmlops-practicesdata-governance-frameworks

How to Apply on Lever

Lever uses a streamlined one-page form — apply in under 5 minutes.
LinkedIn import works well; review parsed data before submitting.
The cover letter field is optional but visible to reviewers — use it to differentiate.
Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →