Skip to main content

Adaptive Retrieval

Most RAG systems do: query → embed → vector search → LLM. Every. Single. Time. That's a tutorial. Production systems decide how to retrieve based on what kind of question is being asked.


The Problem With One-Size-Fits-All

"What is Section 173 BNSS?"           → Simple factual lookup → standard top-3 is fine
"Difference between IPC and BNS?" → Comparison → needs context from BOTH documents
"How do I file an FIR step by step?" → Procedural → needs sequential ordering + more chunks
"What about in criminal cases?" → Follow-up → needs chat history to understand context

Same retrieval path for all 4 = wrong answer for at least 3 of them.


5 Query Types → 5 Retrieval Strategies

Query TypeExampleStrategyDetail
Factual Lookup"What is Section 173 BNSS?"Standard vector searchTop-k=3, threshold >70%, fetch exact clause
Comparison"Difference between IPC and BNS for theft?"Multi-query retrievalGenerate 2 sub-queries, search separately, merge contexts
Procedural"How do I file an FIR step by step?"Reranking retrievalFetch top-k=8, cross-encoder rerank, sequential order
Ambiguous"What are my rights?" (no context)Broad + structuredWide retrieval + structured response with multiple scenarios
Follow-up"What about in criminal cases?"History-expanded queryExpand using chat_history[-3:] to reconstruct full query

Implementation

Step 1 — Extend classify_node

def classify_node(state: RAGState) -> dict:
query = state["query"]

# Existing checks
if is_abusive(query): return {"query_type": "abusive"}
if is_greeting(query): return {"query_type": "greeting"}

# Adaptive retrieval classification
if is_comparison(query): return {"query_type": "comparison"}
if is_procedural(query): return {"query_type": "procedural"}
if has_prior_context(state):return {"query_type": "followup"}
return {"query_type": "factual"} # default


def is_comparison(query: str) -> bool:
keywords = ["difference", "compare", "vs", "versus",
"better", "worse", "old vs new", "ipc vs bns"]
return any(kw in query.lower() for kw in keywords)


def is_procedural(query: str) -> bool:
keywords = ["how to", "step by step", "procedure",
"process", "what are the steps", "how do i"]
return any(kw in query.lower() for kw in keywords)


def has_prior_context(state: RAGState) -> bool:
# Short query + existing history = likely follow-up
return (len(state["query"].split()) < 6
and len(state.get("chat_history", [])) > 0)

Step 2 — Route in retrieve_node

def retrieve_node(state: RAGState) -> dict:
qtype = state["query_type"]

if qtype == "comparison":
return multi_query_retrieve(state)

elif qtype == "procedural":
return reranked_retrieve(state, k=8)

elif qtype == "followup":
return history_expanded_retrieve(state)

else: # "factual" default
return standard_retrieve(state, k=3)

The 4 Retrieval Functions

# 1. STANDARD — Default factual lookup
def standard_retrieve(state: RAGState, k: int = 3) -> dict:
vector = jina.embed_query(state["safe_query"])
results = qdrant.search(query_vector=vector, limit=k)
return build_context(results)


# 2. MULTI-QUERY — For comparisons
def multi_query_retrieve(state: RAGState) -> dict:
# LLM generates focused sub-queries
sub_queries = llm.invoke(
f"Generate 2 focused search queries to answer: {state['safe_query']}"
)
all_results = []
for sq in sub_queries:
vector = jina.embed_query(sq)
results = qdrant.search(query_vector=vector, limit=5)
all_results.extend(results)

# Deduplicate by parent_id
return build_context(deduplicate(all_results))


# 3. RERANKED — For procedural, sequential answers
def reranked_retrieve(state: RAGState, k: int = 8) -> dict:
vector = jina.embed_query(state["safe_query"])
results = qdrant.search(query_vector=vector, limit=k)

# Cross-encoder reranking for relevance
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(state["safe_query"], r.payload["text"]) for r in results]
scores = reranker.predict(pairs)

reranked = sorted(
zip(results, scores),
key=lambda x: x[1],
reverse=True
)
return build_context([r for r, _ in reranked[:4]])


# 4. HISTORY-EXPANDED — For follow-up questions
def history_expanded_retrieve(state: RAGState) -> dict:
# Last 3 messages give context to short follow-up
recent = state["chat_history"][-3:]
history_text = " ".join(m["content"] for m in recent)

expanded = f"{state['safe_query']} {history_text}"
vector = jina.embed_query(expanded[:500]) # Token limit safe
results = qdrant.search(query_vector=vector, limit=5)
return build_context(results)

What This Signals in an Interview

Entry level  → One retrieval path. Same k. Same threshold. No query analysis.

Mid level → Parent-child chunking. Confidence scoring.
Circuit breaker. PII masking.

Senior level → Adaptive retrieval routing. Multi-query for comparisons.
Reranking for procedural. History expansion for follow-ups.
Query-type-aware confidence thresholds.
One-liner for interviews

"The biggest RAG improvement in production is moving from one-size-fits-all vector search to query-type-aware adaptive retrieval — where factual queries get standard top-3 search, comparison queries trigger multi-sub-query retrieval, and follow-up questions use history-expanded queries. The routing decision happens in a LangGraph node before the vector search, adding ~5ms of latency for significantly better answer quality."