Company

Data Platform

StaffSoftwareDevelopmentEngineer(DataEngineer)

bengaluru, karnataka, india FULL TIME

Market Sentiment

HIGH DEMAND

Neural analysis suggests this role is
optimal for Senior candidates.

The Brief

“Staff Software Development Engineer (Data Engineer). Skills: Data Engineering, high-performance data retrieval and storage, Vector Databases, Graph Databases, LLM orchestration frameworks, embedding models, large-scale data processing. Lead the design and development of hybrid retrieval architectures combining vector similarity search with structured graph traversals. Architect scalable data pipelines for the ingestion, embedding, and indexing of massive, multi-modal datasets”

What You'll Achieve.

ensuring high-performance relationship mapping and ontological integrity; monitor the quality of embeddings and the health of the vector space; focusing on long-term persistence, observability, and sub-second latency; ensuring that disparate data sources are unified into a coherent, searchable knowledge base

Industry & Context.

Data Platform

What They're Looking For.

Must Have

7+ years of experience in data engineering or backend systems with a focus on high-performance data retrieval and storage, BE. Tech in Computer Science, Mathematics, or equivalent, Expert proficiency in Python, Java, or Go, with a grasp of distributed system design patterns, Deep understanding of Vector Databases, including indexing strategies (HNSW, IVFFlat, PQ) and distance metrics (Cosine, Euclidean, Dot Product), Experience with Pinecone, Milvus, Weaviate, or Qdrant, background in Graph Databases (Neo4j, AWS Neptune, or ArangoDB) and query languages like Cypher or Gremlin, Experience with Data Modeling and organization, specifically in building semantic layers, ontologies, and taxonomies, Hands-on experience with LLM orchestration frameworks (LangChain, LlamaIndex) and embedding models (OpenAI, HuggingFace, Cohere), Proficiency in large-scale data processing using Spark, Flink, or Kafka for real-time indexing and ETL, Understanding of Information Retrieval (IR) fundamentals, including BM25, TF-IDF, and reciprocal rank fusion, Experience with cloud-native infrastructure (AWS/GCP/Azure) and container orchestration (Kubernetes)

Nice to Have

MS or PhD in a related field is a plus, Bachelors/master's in computer science or a related field with 7-9 years of professional experience

What You'll Do.

Lead the design and development of hybrid retrieval architectures combining vector similarity search with structured graph traversals

Architect scalable data pipelines for the ingestion

and indexing of massive

Innovate and prototype advanced retrieval techniques

including multi-stage re-ranking

graph-tooling for LLMs

and dynamic metadata filtering

Design and implement schemas for complex knowledge graphs

ensuring high-performance relationship mapping and ontological integrity

Build automated data validation and drift detection systems to monitor the quality of embeddings and the health of the vector space

Drive technical implementation of "Memory" systems for AI agents

focusing on long-term persistence

and sub-second latency

Champion data organization standards

ensuring that disparate data sources are unified into a coherent

searchable knowledge base

Collaborate with AI Research and Product teams to evaluate emerging database technologies (e. g.

GraphRAG) and integrate them into production

How You'll Work.

Team & Collaboration

Collaborate with AI Research and Product teams

Full Job Description

## Key Responsibilities Lead the design and development of hybrid retrieval architectures combining vector similarity search with structured graph traversals. Architect scalable data pipelines for the ingestion, embedding, and indexing of massive, multi-modal datasets. Innovate and prototype advanced retrieval techniques, including multi-stage re-ranking, graph-tooling for LLMs, and dynamic metadata filtering. Design and implement schemas for complex knowledge graphs, ensuring high-performance relationship mapping and ontological integrity. Build automated data validation and drift detection systems to monitor the quality of embeddings and the health of the vector space. Drive technical implementation of "Memory" systems for AI agents, focusing on long-term persistence, observability, and sub-second latency. Champion data organization standards, ensuring that disparate data sources are unified into a coherent, searchable knowledge base. Collaborate with AI Research and Product teams to evaluate emerging database technologies (e.g., HNSW optimizations, GraphRAG) and integrate them into production. ## Skills and Attributes for Success 7+ years of experience in data engineering or backend systems with a focus on high-performance data retrieval and storage. BE/B.Tech in Computer Science, Mathematics, or equivalent. MS or PhD in a related field is a plus. Expert proficiency in Python, Java, or Go, with a strong grasp of distributed system design patterns. Deep understanding of Vector Databases, including indexing strategies (HNSW, IVFFlat, PQ) and distance metrics (Cosine, Euclidean, Dot Product). Experience with Pinecone, Milvus, Weaviate, or Qdrant. Strong background in Graph Databases (Neo4j, AWS Neptune, or ArangoDB) and query languages like Cypher or Gremlin. Experience with Data Modeling and organization, specifically in building semantic layers, ontologies, and taxonomies. Hands-on experience with LLM orchestration frameworks (LangChain, LlamaIndex) and embedding

Free ATS check

Applying for this Staff Software Development Engineer (Data Engineer) role?

Most applicants get filtered before a human reads their resume. See if yours makes the cut.

Should you apply? AI reads your resume vs this job — match score, gaps to address, ATS keywords.

SKILL SIGNAL 51 detected · ranked by frequency

Vector Databases ×6

Graph Databases ×6

LLM orchestration frameworks ×6

embedding models ×6

Spark ×5

Flink ×5

Kafka ×5

Kubernetes ×5

Python ×4

Java ×4

Go ×4

Information Retrieval (IR) ×4

cloud-native infrastructure ×4

container orchestration ×4

Data Engineering ×2

high-performance data retrieval and storage ×2

large-scale data processing ×2

Pinecone ×2

Milvus ×2

Weaviate ×2

Qdrant ×2

Neo4j ×2

AWS Neptune ×2

ArangoDB ×2

LangChain ×2

LlamaIndex ×2

OpenAI ×2

HuggingFace ×2

Cohere ×2

AWS ×2

GCP ×2

Azure ×2

BEHAVIOURAL

Lead the design and developmentInnovate and prototypeChampion data organization standardsCollaborate with AI Research and Product teams

Role Details

Experience 7–10 yrs

Level Senior

Type FULL TIME

Education BE. Tech in Computer Science, Mathematics, or equivalent

AI-Extracted Insights

Domain Areas

information-retrieval-ir-fundamentalsknowledge-graphsai-agents

How to Apply on Lever

Lever uses a streamlined one-page form — apply in under 5 minutes.
LinkedIn import works well; review parsed data before submitting.
The cover letter field is optional but visible to reviewers — use it to differentiate.
Referral codes from employees can significantly boost visibility of your application.

ANONYMOUS · UNFILTERED

What do employees actually say about this company?

Real rants from real employees. Read before you apply.

Read Company Rants →