Chapter 13 β Known Edge Cases & Future Roadmap
| Edge Case | Description | Current Impact | Proposed Fix |
|---|---|---|---|
| Seventh Schedule Overlap | Schedule entries use numbered lists ("19. Price control", "34. Betting"). The chunker's regex tags these as article_number: "19". | Low β LLM correctly differentiates in its response but adds an unnecessary note | Enhance regex to verify chunk resides within Part IβXXII before tagging |
| General Conceptual Queries | Queries like "What are Fundamental Rights?" don't specify an article number, so metadata filter is not applied. | Medium β relies on semantic search, which is good but not 100% precise | Add part metadata (e.g., part: "III") to enable Part-level filtering |
| Cross-Article References | Some articles reference others (e.g., Article 32 protects rights under Article 19). | Low β system answers correctly for individual articles but doesn't auto-link related ones | Future: build a Knowledge Graph (Neo4j) to map inter-article relationships |
Hallucination-Resistant RAG β Constitution of India Case StudyPage 20
Chapter 14 β Key Takeaways (Interview-Ready)
Interview Prep
If asked: "How did you handle RAG for a complex legal document?"
The Architecture Patternβ
| Layer | Technique | Industry Term |
|---|---|---|
| Parsing | PyMuPDF + Custom Regex Cleaning | Structure-Aware Document Parsing |
| Chunking | Article-boundary splits with parent β child hierarchy | Hierarchical Chunking / Multi-Granularity Chunking |
| Metadata | Dynamic Article number extraction & injection | Metadata-Enriched Vector Indexing |
| Retrieval | LLM Router β Metadata Filter β Pinecone | Hybrid Retrieval with Metadata Filtering |
| Overall | 8-Node LangGraph Pipeline with classification, retrieval, generation, hallucination guard | Agentic RAG Architecture |
The One-Linerβ
"I replaced expensive LLM parsing with a deterministic, structure-aware pipeline using Hierarchical Chunking and strict metadata filtering β achieving hallucination-resistant retrieval on a 400-page legal document at zero parsing cost."
β End of Case Study β Β© Ambuj Kumar Tripathi | All Rights Reserved