Chapter 07 — Deployment & Infrastructure
7.1 Zero-Cost Free-Tier Infrastructure
AFP runs entirely on free-tier services — every component from vector storage to LLM inference operates at $0 monthly cost. This was achieved through deliberate architectural choices: MRL embeddings to minimize vector storage, Redis caching to eliminate redundant LLM calls, and a PyMuPDF local fallback to protect paid API quotas.
| Service Category | Provider (Free Tier) | Constraint / Limit |
|---|---|---|
| Backend Hosting | Render | 512MB RAM, shared CPU, cold starts on free tier |
| Vector Database | Pinecone Serverless | Free tier storage and query volume limits |
| Document Storage | Supabase | Free tier storage bucket — 1GB |
| Primary Database | MongoDB Atlas | Free cluster — 512MB storage |
| Cache / Rate Limit | Upstash Redis | Free tier — 10,000 requests/day |
| Observability | Langfuse Cloud | Free tier — trace volume limit |
| LLM Inference | OpenRouter (Qwen 2.5 72B) | Pay-per-token — near-zero for typical usage |
| Embedding API | Jina v3 | Free tier monthly token budget |
| Web Search | Tavily API | Free tier — 1,000 searches/month |
7.2 512MB RAM Optimisation Strategies
- Streaming Uploads (10MB cap): Files are streamed, not fully buffered — prevents large file memory spikes at the upload boundary.
- Jina v3 MRL 256d: 75% smaller embedding vectors reduce in-memory computation and Pinecone payload sizes significantly.
- Async MongoDB (Motor): Non-blocking async I/O prevents worker thread starvation during database read/write operations.
- Upstash Redis Caching: Exact-match query cache bypasses all LLM calls for repeated queries — zero additional RAM overhead.
- PyMuPDF for Temp Files: Pure-Python local parsing for user temporary uploads — no subprocess spawning, no external API memory overhead.
- LlamaParse Tier Selection: Dynamic tier assignment avoids over-provisioning compute for simple documents.
- pybreaker Circuit Breakers: Prevents cascading API failures from exhausting all available worker threads simultaneously.
- LangGraph StateGraph: Explicit state passing eliminates redundant recomputation — each node receives only required state slices.
Agentic Financial Parser v2.0 — Technical DocumentationPage 12