Skip to main content

๐Ÿ›ก๏ธ Citizen Safety & Awareness AI - Technical Documentation

Developer Context

Developer: Ambuj Kumar Tripathi
Interview Ready: January 29, 2026
Architecture: React + FastAPI + RAG + Multi-Cloud


๐Ÿ“‚ Project Structureโ€‹

Citizen Safety Awareness AI/
โ”œโ”€โ”€ backend/
โ”‚ โ”œโ”€โ”€ app/
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”‚ โ”œโ”€โ”€ main.py # FastAPI app entry point
โ”‚ โ”‚ โ”œโ”€โ”€ config.py # Pydantic settings management
โ”‚ โ”‚ โ”œโ”€โ”€ auth/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ routes.py # Google OAuth endpoints
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ jwt.py # JWT token creation/verification
โ”‚ โ”‚ โ”œโ”€โ”€ rag/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ pipeline.py # Core RAG logic (embeddings, search, LLM)
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ routes.py # Chat & upload API endpoints
โ”‚ โ”‚ โ””โ”€โ”€ db/
โ”‚ โ”‚ โ””โ”€โ”€ database.py # MongoDB connection & CRUD
โ”‚ โ”œโ”€โ”€ data/ # 8 Core PDF documents (permanent knowledge)
โ”‚ โ”œโ”€โ”€ chroma_db/ # Vector database (persistent)
โ”‚ โ”œโ”€โ”€ temp_uploads/ # User-uploaded PDFs (temporary)
โ”‚ โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”‚ โ””โ”€โ”€ .env # Environment variables (not in git)
โ”‚
โ””โ”€โ”€ frontend/
โ””โ”€โ”€ src/
โ”œโ”€โ”€ App.jsx # React Router setup
โ”œโ”€โ”€ main.jsx # React entry point
โ”œโ”€โ”€ index.css # Global styles
โ”œโ”€โ”€ components/
โ”‚ โ”œโ”€โ”€ Dashboard.jsx # Main chat interface
โ”‚ โ”œโ”€โ”€ Chat.jsx # Alternate chat component
โ”‚ โ”œโ”€โ”€ Navbar.jsx # Navigation bar
โ”‚ โ””โ”€โ”€ Login.jsx # Login page
โ”œโ”€โ”€ context/
โ”‚ โ””โ”€โ”€ AuthContext.jsx # Auth state management
โ”œโ”€โ”€ api/
โ”‚ โ””โ”€โ”€ index.js # Axios API client
โ””โ”€โ”€ pages/
โ””โ”€โ”€ AuthCallback.jsx # OAuth callback handler

๐Ÿ—๏ธ Architecture Overviewโ€‹


๐Ÿ” Authentication Flowโ€‹

Google OAuth 2.0 Implementationโ€‹

StepComponentDescription
1Login.jsxUser clicks "Login with Google"
2auth/routes.py:login()Redirects to Google OAuth
3GoogleUser authenticates
4auth/routes.py:auth_callback()Receives auth code, exchanges for token
5auth/jwt.py:create_access_token()Creates JWT with user info
6FrontendStores JWT in localStorage
7Every RequestJWT sent in Authorization header
OAuth Scope (auth/routes.py)
client_kwargs={'scope': 'openid email profile'}

Only accesses: Name, Email, Profile Picture. No Gmail inbox access.

JWT Token Structure
{
"email": "user@gmail.com",
"name": "User Name",
"picture": "https://...",
"exp": 1738123456
}

๐Ÿง  RAG Pipeline Deep Diveโ€‹

Core File

backend/app/rag/pipeline.py (472 lines)

1. Document Ingestion Flowโ€‹

Key Functions:

FunctionLinesPurpose
get_embeddings()33-43Lazy-load Google Generative AI Embeddings
rebuild_vector_db()153-203Full re-index from /data folder
add_documents_incremental()206-252Add user uploads without touching core index
clear_temporary_knowledge()255-274Surgically delete only temp docs

2. PII Masking (Microsoft Presidio)โ€‹

Masking Implementation
# Line 80-104
def mask_pii(text: str) -> tuple[str, bool, list]:
# Uses spaCy en_core_web_sm model
# Detects: PHONE_NUMBER, EMAIL_ADDRESS, PERSON, LOCATION
# Custom Indian phone regex: (+91)?[6-9]\d{9}
Custom Recognizer (Lines 65-74)
phone_pattern = Pattern(
name="phone_number_regex",
regex=r"(\+91[\-\s]?)?[6-9]\d{9}", # Indian phone numbers
score=0.5
)

3. Search & Response Flowโ€‹

4. LLM Configuration (Lines 323-334)โ€‹

llm = ChatOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=settings.OPENROUTER_API_KEY,
model="meta-llama/llama-3.3-70b-instruct:free",
temperature=0.3,
max_tokens=3000
)

5. Circuit Breaker Pattern (Lines 23-24, 357-364)โ€‹

llm_breaker = CircuitBreaker(fail_max=5, reset_timeout=30)

# Usage:
response = llm_breaker.call(chain.invoke, {...})

Purpose: If LLM fails 5 times โ†’ circuit opens โ†’ fast-fail for 30 seconds โ†’ auto-retry.


๐Ÿ’พ Database Layerโ€‹

MongoDB Collectionsโ€‹

CollectionPurposeTTL
chat_historyUser conversations30 days (GDPR)
feedback๐Ÿ‘/๐Ÿ‘Ž ratingsNone
GDPR Compliance

TTL Index database.py:

_db["chat_history"].create_index("last_activity", expireAfterSeconds=2592000)  # 30 days

Redis (Upstash) Usageโ€‹

KeyPurpose
visitor_countTotal website visits
active_usersUsers active in last 15 mins

๐ŸŒ API Endpointsโ€‹

Base URL: https://citizen-safety-backend-mkbn.onrender.com

MethodEndpointAuthDescription
GET/โŒWelcome message
GET/healthโŒHealth check
GET/docsโŒSwagger UI
GET/auth/loginโŒStart OAuth flow
GET/auth/callbackโŒOAuth callback
POST/api/chatโœ…Send message (rate limited: 10/min)
GET/api/historyโœ…Get chat history
POST/api/clearโœ…Clear history
POST/api/uploadโœ…Upload PDFs
POST/api/rebuild-kbโœ…Rebuild/reset knowledge base
GET/api/kb-statusโŒCheck KB status
GET/api/statsโŒGet visitor stats
POST/api/stats/incrementโŒIncrement visitor count
GET/api/stats/activeโŒGet active users
POST/api/feedbackโœ…Submit feedback

๐Ÿ“Š Response Schemaโ€‹

{
"response": "### Digital Arrest Scam\n\nDigital arrest is...",
"sources": [
{
"source_id": 1,
"file": "DigitalArrest",
"page": 3,
"preview": "Digital arrest scam involves..."
}
],
"confidence": 87.5,
"latency": 2.34,
"pii_masked": true,
"pii_entities": [
{"type": "PHONE_NUMBER", "score": 0.85, "start": 10, "end": 20}
],
"masked_question": "My number is <PHONE_NUMBER>, help me"
}

๐Ÿ“š Knowledge Base (8 Core PDFs)โ€‹

#FileTopicPages
1ADVISORYTAU-ADV...Digital Arrest Scam Advisory~10
2CARE AND PROTECTION...POCSO Act~50
3FraudsterssendingFake...Fake Job SMS Fraud~8
4Ombudsman_Scheme.pdfBanking Ombudsman~20
5RBIOS2021_amendments.pdfRBI Amendments~15
6RBI_English BEAWARE.pdfRBI Fraud Awareness~30
7posh handbook.pdfPOSH (Sexual Harassment) Act~100
8Ambuj_Kumar_Resume.pdfDeveloper Resume2

๐ŸŽจ Frontend Featuresโ€‹

Dashboard.jsx (787 lines)

FeatureLinesDescription
Toast System32-37Custom notifications
Message Display490-620Markdown rendering with ReactMarkdown
Source Citations518-564Expandable source badges
PII Audit Modal718-781View Presidio detection log
File Upload161-168PDF drag & drop
Incremental Index170-192"Sync Brain" button
Surgical Clear194-208"Reset Context Brain" button
Quick Actions211-219Pre-defined prompts
Tech Stack Dropdown221-233Show technologies used
Live Active Users74-81Real-time from Redis

โš™๏ธ Environment Variablesโ€‹

Backend (.env)โ€‹

VariablePurpose
GOOGLE_CLIENT_IDOAuth Client ID
GOOGLE_CLIENT_SECRETOAuth Secret
GOOGLE_REDIRECT_URIOAuth callback URL
FRONTEND_URLVercel frontend URL
SECRET_KEYJWT signing key
OPENROUTER_API_KEYLLM API key
GOOGLE_API_KEYEmbeddings API key
MONGO_URIMongoDB Atlas connection
UPSTASH_REDIS_REST_URLRedis URL
UPSTASH_REDIS_REST_TOKENRedis token
LANGFUSE_*Observability keys

Frontend (.env)โ€‹

VariablePurpose
VITE_API_URLBackend API base URL

๐Ÿš€ Deployment Architectureโ€‹

ServicePlatformURL
FrontendVercelhttps://citizen-safety-ai-assistant.vercel.app
BackendRender (Free)https://citizen-safety-backend-mkbn.onrender.com
DatabaseMongoDB AtlasCloud cluster
CacheUpstash RedisServerless Redis
LLMOpenRouterLlama 3.3 70B
EmbeddingsGoogle AIembedding-001
MonitoringLangfuseLLM observability

๐Ÿ›ก๏ธ Security Featuresโ€‹

FeatureImplementation
PII MaskingMicrosoft Presidio + spaCy en_core_web_sm
Rate LimitingSlowAPI: 10 requests/minute per user
JWT AuthenticationHS256 signed tokens, 24hr expiry
CORSWhitelist: localhost, Vercel URL
Circuit Breakerpybreaker: 5 failures โ†’ 30s cooldown
GDPR ComplianceMongoDB TTL: 30-day auto-delete
Input ValidationPydantic models with Field constraints
Abusive Language FilterKeyword blocklist

๐Ÿ“ˆ Memory Optimizationโ€‹

Optimization Results

Estimated Usage: 250-400 MB (safely within Render 512MB limit)

OptimizationSavings
Google API Embeddings (vs HuggingFace local)~300MB RAM saved
spaCy en_core_web_sm (vs en_core_web_lg)~700MB RAM saved
Lazy imports inside functionsFaster cold start
Global singleton patternNo duplicate instances

๐Ÿงช Interview Talking Pointsโ€‹

1. "Walk me through the RAG pipeline"โ€‹

  1. User submits question โ†’ PII masked by Presidio
  2. Question embedded via Google Generative AI
  3. ChromaDB similarity search (k=3) โ†’ returns relevant chunks
  4. Context + question sent to Llama 3.3 70B via OpenRouter
  5. Response includes sources with page numbers
  6. Confidence score calculated from vector distance

2. "How do you handle security?"โ€‹

  • Authentication: Google OAuth 2.0 โ†’ JWT tokens
  • PII Protection: Microsoft Presidio real-time masking
  • Rate Limiting: SlowAPI prevents abuse
  • GDPR: MongoDB TTL auto-deletes after 30 days
  • Circuit Breaker: Prevents cascading LLM failures

3. "Explain incremental indexing"โ€‹

  • Core 8 PDFs = permanent index (never deleted)
  • User uploads tagged with is_temporary: True metadata
  • "Reset Brain" calls clear_temporary_knowledge()
  • Uses ChromaDB delete(where={"is_temporary": True})
  • No re-indexing of core docs needed

4. "Why these tech choices?"โ€‹

  • FastAPI: Async, automatic OpenAPI docs, Pydantic validation
  • ChromaDB: Simple, no external server, persistent
  • Google Embeddings: Free tier generous, low latency
  • OpenRouter: Access to multiple LLMs, free tier
  • Render: Easy Python deployment, auto-scaling

โœ… Final Deployment Checklistโ€‹

  • GOOGLE_API_KEY added to config.py and Render
  • Chroma import added at top-level (line 16)
  • PyMuPDFLoader & RecursiveCharacterTextSplitter imports in add_documents_incremental()
  • All lazy imports verified inside functions
  • Frontend VITE_API_URL points to Render URL
  • Vercel rewrites configured for SPA routing
  • CORS includes Vercel domain

Document Created: January 29, 2026 | Version: 1.0 | Status: READY FOR INTERVIEW ๐Ÿš€