Local AI Planning Tool
Figure out what you need to run AI locally โ whether you're starting with hardware, a model, or a problem to solve.
Select Your Hardware
Model
Quantization
What do you need AI to do?
Why This Tool Exists
Most VRAM calculators ask you to pick a model and show you a number. But most people don't start with a model โ they start with hardware they already own, or a problem they need to solve. This tool works all three ways.
"I have hardware" shows every model that fits your device, grouped by what you can build with it. "I want a model" gives precise VRAM calculations using real architecture specs. "I need to solve a problem" recommends the right model and stack for your use case, with exact hardware requirements.
The formula is empirically validated against real llama.cpp measurements: VRAM = (P ร bw) + (0.55 + 0.08 ร P) + KV cache, where KV cache uses each model's actual layer count, KV head count, and head dimension.
Common Questions
Can I run LLMs on a Raspberry Pi? Yes โ Gemma 3 1B and Qwen 0.5B run on a Pi 5 with 8GB RAM. Expect 5-15 tokens/sec on CPU. Best for simple tasks, classification, or edge voice assistants.
What's the best GPU for local AI on a budget? A used RTX 3090 (24GB, ~$600-700) is the sweet spot. Runs 32B models at Q4 and 70B at Q2-Q3.
Do MoE models use less VRAM? No. All expert parameters must reside in VRAM. MoE models run faster per-token but use the same memory as a dense model of equal total parameter count.
How does context length affect VRAM? KV cache grows linearly with context. At 128K tokens on Llama 8B, KV cache alone is ~8 GB โ more than the model weights. Always check VRAM at your target context length.
What about phones and edge devices? Gemma 3n was designed for mobile, running with as little as 2-3 GB. Combined with Whisper for speech-to-text, you can build a fully offline voice assistant on a modern phone.
๐ VRAM Requirements Guide ยท GPU Buying Guide ยท Quantization Explained ยท What Runs on 24GB? ยท 16GB? ยท 8GB? ยท Coding Models ยท Voice Chat ยท RAG Guide