$96 a month. 7 sites. 20 agents.
People assume “I run everything on AI” means $500–$2,000/month in API fees. It doesn’t. My entire autonomous operation costs $96/month — because I built the architecture differently. This guide is the exact stack: every tool, every cost, every decision about what goes where.
- ✓The one decision — judgment vs. volume — that cut my AI spend 85%
- ✓All 7 layers, the real monthly cost of each, and the SaaS each one replaced
- ✓A no-GPU version you can stand up this week, plus the full local-GPU build
Judgment vs. volume.
Every dollar I save comes from one line: Claude (expensive per token) handles judgment — strategy, nuance, edge cases. Qwen3 (self-hosted, zero per token) handles volume — bulk generation, classification, summarization at scale. Draw that line and the bill collapses. Here’s where the work actually goes.
The Brain
Claude Sonnet (~$60/mo). Agent decisions, code architecture, publish yes/no. Never bulk jobs.
The Workhorse
Qwen3 on a local RTX 5090, $0 per token. Bulk content, keyword classification, manuscript drafts.
The Orchestrator
Self-hosted n8n (~$20/mo). 17 workflows route every job to the right model — or to no AI at all.
Browser + Research
Browserless for JS scraping, NotebookLM for overnight research. Both $0. Killed Apify and more.
Images + Agents
FLUX on the 5090 for every image, a custom daemon running 20 agents on schedule. $0 ongoing.
From ~$380 to ~$96 a month.
Before I rebuilt, I was paying ~$380/month across Zapier, Midjourney, OpenAI, Brightdata, stock photos, and scattered SaaS. The guide breaks down every layer line by line — what it costs, what it replaced, and the config details (like the Qwen3 /no_think trick) that keep the volume work effectively free.
The honest answers.
Do I need a $2,000 GPU to do this?
No. The guide includes a minimum viable version with no GPU at all — Claude API, self-hosted n8n on a cheap VPS, free NotebookLM, and a simple cron loop for agents. The local RTX 5090 is the full build that drives the volume work to $0 per token; it pays back in 6–12 months but it’s optional.
How is $96 even possible?
Because most AI bills explode on tasks that don’t need AI. Pure data processing runs as plain code (zero tokens). Bulk work runs on self-hosted Qwen3 (zero per token). Only judgment goes to Claude. That one split moved 85% of my workload off the expensive model.
Is this the literal stack you use?
Yes — every tool and cost in the guide is what’s running right now: Claude Sonnet, Qwen3 on an RTX 5090, n8n on Proxmox, Browserless, NotebookLM, ComfyUI + FLUX, and a custom agent-runner daemon. You don’t need my exact setup; you need the principle behind it.
Will it work for my business, not just sites?
The architecture is the point, not the niche. The same judgment-vs-volume split applies whether you run content sites, a store, a publishing pipeline, or all three — that’s exactly what I run on it. The guide shows you how to map your own work onto the layers.
Will you spam me?
No. One-click unsubscribe in every email, no upsell sequence, no “limited time” pitches. Reply to anything I send and I read it.
Stop renting your AI stack.
Own it for $96.
The full breakdown — every layer, every cost, the no-GPU version and the full build.