Architecture
RWKV-7: Infinite Context, Zero KV Cache — The Local-First Architecture
RWKV-7 uses O(1) memory per token. Context length doesn't increase VRAM. At all. 16 tok/s on a Raspberry Pi. Here's why it matters for local AI and how to run it.
Model Routing for Local AI — Stop Using One Model for Everything
You're running one model for every task. That wastes VRAM, burns electricity, and gives worse results. Model routing sends each task to the right model at the right cost. Here's how to set it up.
MoE Models Explained: Why Mixtral Uses 46B Parameters But Runs Like 13B
Mixture of Experts explained for local AI — why MoE models run fast but still need full VRAM. Mixtral, DeepSeek V3, DBRX compared with dense model alternatives.
Beyond Transformers: 5 Architectures for Your $50 Mini PC
We benchmarked RWKV-7 vs gemma3 on a $50 mini PC. The transformer crashed at turn 6. Here are 5 alternative architectures that run better on budget hardware.
How OpenClaw Actually Works: Architecture Guide
5 input types explain the 'alive' behavior: messages, heartbeats, crons, hooks, and webhooks feed a single agent loop. The 3am phone call was just a timer event.