70B
Running 70B Models Locally — Exact VRAM by Quantization
Llama 3.3 70B needs 43GB at Q4, 75GB at Q8, 141GB at FP16. Here's every quant level, which GPUs fit, real speeds, and when 32B is the smarter choice.
What Can You Actually Run on 24GB VRAM?
32B models at 25-38 tok/s, 70B at Q3 with limited context, Flux at full FP16, and LoRA fine-tuning. RTX 3090 at $700 vs 4090 at $1,800—every model that fits and which GPU to buy.