Skip to main content

Chapter 07 — Deployment & Infrastructure

7.1 Zero-Cost Free-Tier Infrastructure

AFP runs entirely on free-tier services — every component from vector storage to LLM inference operates at $0 monthly cost. This was achieved through deliberate architectural choices: MRL embeddings to minimize vector storage, Redis caching to eliminate redundant LLM calls, and a PyMuPDF local fallback to protect paid API quotas.

Service CategoryProvider (Free Tier)Constraint / Limit
Backend HostingRender512MB RAM, shared CPU, cold starts on free tier
Vector DatabasePinecone ServerlessFree tier storage and query volume limits
Document StorageSupabaseFree tier storage bucket — 1GB
Primary DatabaseMongoDB AtlasFree cluster — 512MB storage
Cache / Rate LimitUpstash RedisFree tier — 10,000 requests/day
ObservabilityLangfuse CloudFree tier — trace volume limit
LLM InferenceOpenRouter (Qwen 2.5 72B)Pay-per-token — near-zero for typical usage
Embedding APIJina v3Free tier monthly token budget
Web SearchTavily APIFree tier — 1,000 searches/month

7.2 512MB RAM Optimisation Strategies

  • Streaming Uploads (10MB cap): Files are streamed, not fully buffered — prevents large file memory spikes at the upload boundary.
  • Jina v3 MRL 256d: 75% smaller embedding vectors reduce in-memory computation and Pinecone payload sizes significantly.
  • Async MongoDB (Motor): Non-blocking async I/O prevents worker thread starvation during database read/write operations.
  • Upstash Redis Caching: Exact-match query cache bypasses all LLM calls for repeated queries — zero additional RAM overhead.
  • PyMuPDF for Temp Files: Pure-Python local parsing for user temporary uploads — no subprocess spawning, no external API memory overhead.
  • LlamaParse Tier Selection: Dynamic tier assignment avoids over-provisioning compute for simple documents.
  • pybreaker Circuit Breakers: Prevents cascading API failures from exhausting all available worker threads simultaneously.
  • LangGraph StateGraph: Explicit state passing eliminates redundant recomputation — each node receives only required state slices.

Agentic Financial Parser v2.0 — Technical DocumentationPage 12