๐ก๏ธ Citizen Safety & Awareness AI - Technical Documentation
Developer Context
Developer: Ambuj Kumar Tripathi
Interview Ready: January 29, 2026
Architecture: React + FastAPI + RAG + Multi-Cloud
๐ Project Structureโ
Citizen Safety Awareness AI/
โโโ backend/
โ โโโ app/
โ โ โโโ __init__.py
โ โ โโโ main.py # FastAPI app entry point
โ โ โโโ config.py # Pydantic settings management
โ โ โโโ auth/
โ โ โ โโโ routes.py # Google OAuth endpoints
โ โ โ โโโ jwt.py # JWT token creation/verification
โ โ โโโ rag/
โ โ โ โโโ pipeline.py # Core RAG logic (embeddings, search, LLM)
โ โ โ โโโ routes.py # Chat & upload API endpoints
โ โ โโโ db/
โ โ โโโ database.py # MongoDB connection & CRUD
โ โโโ data/ # 8 Core PDF documents (permanent knowledge)
โ โโโ chroma_db/ # Vector database (persistent)
โ โโโ temp_uploads/ # User-uploaded PDFs (temporary)
โ โโโ requirements.txt # Python dependencies
โ โโโ .env # Environment variables (not in git)
โ
โโโ frontend/
โโโ src/
โโโ App.jsx # React Router setup
โโโ main.jsx # React entry point
โโโ index.css # Global styles
โโโ components/
โ โโโ Dashboard.jsx # Main chat interface
โ โโโ Chat.jsx # Alternate chat component
โ โโโ Navbar.jsx # Navigation bar
โ โโโ Login.jsx # Login page
โโโ context/
โ โโโ AuthContext.jsx # Auth state management
โโโ api/
โ โโโ index.js # Axios API client
โโโ pages/
โโโ AuthCallback.jsx # OAuth callback handler
๐๏ธ Architecture Overviewโ
๐ Authentication Flowโ
Google OAuth 2.0 Implementationโ
| Step | Component | Description |
|---|---|---|
| 1 | Login.jsx | User clicks "Login with Google" |
| 2 | auth/routes.py:login() | Redirects to Google OAuth |
| 3 | User authenticates | |
| 4 | auth/routes.py:auth_callback() | Receives auth code, exchanges for token |
| 5 | auth/jwt.py:create_access_token() | Creates JWT with user info |
| 6 | Frontend | Stores JWT in localStorage |
| 7 | Every Request | JWT sent in Authorization header |
OAuth Scope (
auth/routes.py)client_kwargs={'scope': 'openid email profile'}
Only accesses: Name, Email, Profile Picture. No Gmail inbox access.
JWT Token Structure
{
"email": "user@gmail.com",
"name": "User Name",
"picture": "https://...",
"exp": 1738123456
}
๐ง RAG Pipeline Deep Diveโ
Core File
backend/app/rag/pipeline.py (472 lines)
1. Document Ingestion Flowโ
Key Functions:
| Function | Lines | Purpose |
|---|---|---|
get_embeddings() | 33-43 | Lazy-load Google Generative AI Embeddings |
rebuild_vector_db() | 153-203 | Full re-index from /data folder |
add_documents_incremental() | 206-252 | Add user uploads without touching core index |
clear_temporary_knowledge() | 255-274 | Surgically delete only temp docs |
2. PII Masking (Microsoft Presidio)โ
Masking Implementation
# Line 80-104
def mask_pii(text: str) -> tuple[str, bool, list]:
# Uses spaCy en_core_web_sm model
# Detects: PHONE_NUMBER, EMAIL_ADDRESS, PERSON, LOCATION
# Custom Indian phone regex: (+91)?[6-9]\d{9}
Custom Recognizer (Lines 65-74)
phone_pattern = Pattern(
name="phone_number_regex",
regex=r"(\+91[\-\s]?)?[6-9]\d{9}", # Indian phone numbers
score=0.5
)
3. Search & Response Flowโ
4. LLM Configuration (Lines 323-334)โ
llm = ChatOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=settings.OPENROUTER_API_KEY,
model="meta-llama/llama-3.3-70b-instruct:free",
temperature=0.3,
max_tokens=3000
)
5. Circuit Breaker Pattern (Lines 23-24, 357-364)โ
llm_breaker = CircuitBreaker(fail_max=5, reset_timeout=30)
# Usage:
response = llm_breaker.call(chain.invoke, {...})
Purpose: If LLM fails 5 times โ circuit opens โ fast-fail for 30 seconds โ auto-retry.
๐พ Database Layerโ
MongoDB Collectionsโ
| Collection | Purpose | TTL |
|---|---|---|
chat_history | User conversations | 30 days (GDPR) |
feedback | ๐/๐ ratings | None |
GDPR Compliance
TTL Index database.py:
_db["chat_history"].create_index("last_activity", expireAfterSeconds=2592000) # 30 days
Redis (Upstash) Usageโ
| Key | Purpose |
|---|---|
visitor_count | Total website visits |
active_users | Users active in last 15 mins |
๐ API Endpointsโ
Base URL: https://citizen-safety-backend-mkbn.onrender.com
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| GET | / | โ | Welcome message |
| GET | /health | โ | Health check |
| GET | /docs | โ | Swagger UI |
| GET | /auth/login | โ | Start OAuth flow |
| GET | /auth/callback | โ | OAuth callback |
| POST | /api/chat | โ | Send message (rate limited: 10/min) |
| GET | /api/history | โ | Get chat history |
| POST | /api/clear | โ | Clear history |
| POST | /api/upload | โ | Upload PDFs |
| POST | /api/rebuild-kb | โ | Rebuild/reset knowledge base |
| GET | /api/kb-status | โ | Check KB status |
| GET | /api/stats | โ | Get visitor stats |
| POST | /api/stats/increment | โ | Increment visitor count |
| GET | /api/stats/active | โ | Get active users |
| POST | /api/feedback | โ | Submit feedback |
๐ Response Schemaโ
{
"response": "### Digital Arrest Scam\n\nDigital arrest is...",
"sources": [
{
"source_id": 1,
"file": "DigitalArrest",
"page": 3,
"preview": "Digital arrest scam involves..."
}
],
"confidence": 87.5,
"latency": 2.34,
"pii_masked": true,
"pii_entities": [
{"type": "PHONE_NUMBER", "score": 0.85, "start": 10, "end": 20}
],
"masked_question": "My number is <PHONE_NUMBER>, help me"
}
๐ Knowledge Base (8 Core PDFs)โ
| # | File | Topic | Pages |
|---|---|---|---|
| 1 | ADVISORYTAU-ADV... | Digital Arrest Scam Advisory | ~10 |
| 2 | CARE AND PROTECTION... | POCSO Act | ~50 |
| 3 | FraudsterssendingFake... | Fake Job SMS Fraud | ~8 |
| 4 | Ombudsman_Scheme.pdf | Banking Ombudsman | ~20 |
| 5 | RBIOS2021_amendments.pdf | RBI Amendments | ~15 |
| 6 | RBI_English BEAWARE.pdf | RBI Fraud Awareness | ~30 |
| 7 | posh handbook.pdf | POSH (Sexual Harassment) Act | ~100 |
| 8 | Ambuj_Kumar_Resume.pdf | Developer Resume | 2 |
๐จ Frontend Featuresโ
Dashboard.jsx (787 lines)
| Feature | Lines | Description |
|---|---|---|
| Toast System | 32-37 | Custom notifications |
| Message Display | 490-620 | Markdown rendering with ReactMarkdown |
| Source Citations | 518-564 | Expandable source badges |
| PII Audit Modal | 718-781 | View Presidio detection log |
| File Upload | 161-168 | PDF drag & drop |
| Incremental Index | 170-192 | "Sync Brain" button |
| Surgical Clear | 194-208 | "Reset Context Brain" button |
| Quick Actions | 211-219 | Pre-defined prompts |
| Tech Stack Dropdown | 221-233 | Show technologies used |
| Live Active Users | 74-81 | Real-time from Redis |
โ๏ธ Environment Variablesโ
Backend (.env)โ
| Variable | Purpose |
|---|---|
GOOGLE_CLIENT_ID | OAuth Client ID |
GOOGLE_CLIENT_SECRET | OAuth Secret |
GOOGLE_REDIRECT_URI | OAuth callback URL |
FRONTEND_URL | Vercel frontend URL |
SECRET_KEY | JWT signing key |
OPENROUTER_API_KEY | LLM API key |
GOOGLE_API_KEY | Embeddings API key |
MONGO_URI | MongoDB Atlas connection |
UPSTASH_REDIS_REST_URL | Redis URL |
UPSTASH_REDIS_REST_TOKEN | Redis token |
LANGFUSE_* | Observability keys |
Frontend (.env)โ
| Variable | Purpose |
|---|---|
VITE_API_URL | Backend API base URL |
๐ Deployment Architectureโ
| Service | Platform | URL |
|---|---|---|
| Frontend | Vercel | https://citizen-safety-ai-assistant.vercel.app |
| Backend | Render (Free) | https://citizen-safety-backend-mkbn.onrender.com |
| Database | MongoDB Atlas | Cloud cluster |
| Cache | Upstash Redis | Serverless Redis |
| LLM | OpenRouter | Llama 3.3 70B |
| Embeddings | Google AI | embedding-001 |
| Monitoring | Langfuse | LLM observability |
๐ก๏ธ Security Featuresโ
| Feature | Implementation |
|---|---|
| PII Masking | Microsoft Presidio + spaCy en_core_web_sm |
| Rate Limiting | SlowAPI: 10 requests/minute per user |
| JWT Authentication | HS256 signed tokens, 24hr expiry |
| CORS | Whitelist: localhost, Vercel URL |
| Circuit Breaker | pybreaker: 5 failures โ 30s cooldown |
| GDPR Compliance | MongoDB TTL: 30-day auto-delete |
| Input Validation | Pydantic models with Field constraints |
| Abusive Language Filter | Keyword blocklist |
๐ Memory Optimizationโ
Optimization Results
Estimated Usage: 250-400 MB (safely within Render 512MB limit)
| Optimization | Savings |
|---|---|
| Google API Embeddings (vs HuggingFace local) | ~300MB RAM saved |
| spaCy en_core_web_sm (vs en_core_web_lg) | ~700MB RAM saved |
| Lazy imports inside functions | Faster cold start |
| Global singleton pattern | No duplicate instances |
๐งช Interview Talking Pointsโ
1. "Walk me through the RAG pipeline"โ
- User submits question โ PII masked by Presidio
- Question embedded via Google Generative AI
- ChromaDB similarity search (k=3) โ returns relevant chunks
- Context + question sent to Llama 3.3 70B via OpenRouter
- Response includes sources with page numbers
- Confidence score calculated from vector distance
2. "How do you handle security?"โ
- Authentication: Google OAuth 2.0 โ JWT tokens
- PII Protection: Microsoft Presidio real-time masking
- Rate Limiting: SlowAPI prevents abuse
- GDPR: MongoDB TTL auto-deletes after 30 days
- Circuit Breaker: Prevents cascading LLM failures
3. "Explain incremental indexing"โ
- Core 8 PDFs = permanent index (never deleted)
- User uploads tagged with
is_temporary: Truemetadata - "Reset Brain" calls
clear_temporary_knowledge() - Uses ChromaDB
delete(where={"is_temporary": True}) - No re-indexing of core docs needed
4. "Why these tech choices?"โ
- FastAPI: Async, automatic OpenAPI docs, Pydantic validation
- ChromaDB: Simple, no external server, persistent
- Google Embeddings: Free tier generous, low latency
- OpenRouter: Access to multiple LLMs, free tier
- Render: Easy Python deployment, auto-scaling
โ Final Deployment Checklistโ
-
GOOGLE_API_KEYadded to config.py and Render -
Chromaimport added at top-level (line 16) -
PyMuPDFLoader&RecursiveCharacterTextSplitterimports inadd_documents_incremental() - All lazy imports verified inside functions
- Frontend VITE_API_URL points to Render URL
- Vercel rewrites configured for SPA routing
- CORS includes Vercel domain
Document Created: January 29, 2026 | Version: 1.0 | Status: READY FOR INTERVIEW ๐