Adaptive Retrieval

Most RAG systems do: query → embed → vector search → LLM. Every. Single. Time. That's a tutorial. Production systems decide how to retrieve based on what kind of question is being asked.

The Problem With One-Size-Fits-All

"What is Section 173 BNSS?"           → Simple factual lookup → standard top-3 is fine
"Difference between IPC and BNS?"     → Comparison → needs context from BOTH documents
"How do I file an FIR step by step?"  → Procedural → needs sequential ordering + more chunks
"What about in criminal cases?"        → Follow-up → needs chat history to understand context

Same retrieval path for all 4 = wrong answer for at least 3 of them.

5 Query Types → 5 Retrieval Strategies

Query Type	Example	Strategy	Detail
Factual Lookup	"What is Section 173 BNSS?"	Standard vector search	Top-k=3, threshold >70%, fetch exact clause
Comparison	"Difference between IPC and BNS for theft?"	Multi-query retrieval	Generate 2 sub-queries, search separately, merge contexts
Procedural	"How do I file an FIR step by step?"	Reranking retrieval	Fetch top-k=8, cross-encoder rerank, sequential order
Ambiguous	"What are my rights?" (no context)	Broad + structured	Wide retrieval + structured response with multiple scenarios
Follow-up	"What about in criminal cases?"	History-expanded query	Expand using `chat_history[-3:]` to reconstruct full query

Implementation

Step 1 — Extend classify_node

def classify_node(state: RAGState) -> dict:
    query = state["query"]

    # Existing checks
    if is_abusive(query):  return {"query_type": "abusive"}
    if is_greeting(query): return {"query_type": "greeting"}

    # Adaptive retrieval classification
    if is_comparison(query):    return {"query_type": "comparison"}
    if is_procedural(query):    return {"query_type": "procedural"}
    if has_prior_context(state):return {"query_type": "followup"}
    return {"query_type": "factual"}  # default


def is_comparison(query: str) -> bool:
    keywords = ["difference", "compare", "vs", "versus",
                "better", "worse", "old vs new", "ipc vs bns"]
    return any(kw in query.lower() for kw in keywords)


def is_procedural(query: str) -> bool:
    keywords = ["how to", "step by step", "procedure",
                "process", "what are the steps", "how do i"]
    return any(kw in query.lower() for kw in keywords)


def has_prior_context(state: RAGState) -> bool:
    # Short query + existing history = likely follow-up
    return (len(state["query"].split()) < 6
            and len(state.get("chat_history", [])) > 0)

Step 2 — Route in retrieve_node

def retrieve_node(state: RAGState) -> dict:
    qtype = state["query_type"]

    if qtype == "comparison":
        return multi_query_retrieve(state)

    elif qtype == "procedural":
        return reranked_retrieve(state, k=8)

    elif qtype == "followup":
        return history_expanded_retrieve(state)

    else:  # "factual" default
        return standard_retrieve(state, k=3)

The 4 Retrieval Functions

# 1. STANDARD — Default factual lookup
def standard_retrieve(state: RAGState, k: int = 3) -> dict:
    vector = jina.embed_query(state["safe_query"])
    results = qdrant.search(query_vector=vector, limit=k)
    return build_context(results)


# 2. MULTI-QUERY — For comparisons
def multi_query_retrieve(state: RAGState) -> dict:
    # LLM generates focused sub-queries
    sub_queries = llm.invoke(
        f"Generate 2 focused search queries to answer: {state['safe_query']}"
    )
    all_results = []
    for sq in sub_queries:
        vector = jina.embed_query(sq)
        results = qdrant.search(query_vector=vector, limit=5)
        all_results.extend(results)

    # Deduplicate by parent_id
    return build_context(deduplicate(all_results))


# 3. RERANKED — For procedural, sequential answers
def reranked_retrieve(state: RAGState, k: int = 8) -> dict:
    vector = jina.embed_query(state["safe_query"])
    results = qdrant.search(query_vector=vector, limit=k)

    # Cross-encoder reranking for relevance
    from sentence_transformers import CrossEncoder
    reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
    pairs = [(state["safe_query"], r.payload["text"]) for r in results]
    scores = reranker.predict(pairs)

    reranked = sorted(
        zip(results, scores),
        key=lambda x: x[1],
        reverse=True
    )
    return build_context([r for r, _ in reranked[:4]])


# 4. HISTORY-EXPANDED — For follow-up questions
def history_expanded_retrieve(state: RAGState) -> dict:
    # Last 3 messages give context to short follow-up
    recent = state["chat_history"][-3:]
    history_text = " ".join(m["content"] for m in recent)

    expanded = f"{state['safe_query']} {history_text}"
    vector = jina.embed_query(expanded[:500])  # Token limit safe
    results = qdrant.search(query_vector=vector, limit=5)
    return build_context(results)

What This Signals in an Interview

Entry level  → One retrieval path. Same k. Same threshold. No query analysis.

Mid level    → Parent-child chunking. Confidence scoring.
               Circuit breaker. PII masking.

Senior level → Adaptive retrieval routing. Multi-query for comparisons.
               Reranking for procedural. History expansion for follow-ups.
               Query-type-aware confidence thresholds.

One-liner for interviews

"The biggest RAG improvement in production is moving from one-size-fits-all vector search to query-type-aware adaptive retrieval — where factual queries get standard top-3 search, comparison queries trigger multi-sub-query retrieval, and follow-up questions use history-expanded queries. The routing decision happens in a LangGraph node before the vector search, adding ~5ms of latency for significantly better answer quality."

The Problem With One-Size-Fits-All​

5 Query Types → 5 Retrieval Strategies​

Implementation​

Step 1 — Extend classify_node​

Step 2 — Route in retrieve_node​

The 4 Retrieval Functions​

What This Signals in an Interview​