MoE
Mixtral VRAM Requirements: 8x7B and 8x22B at Every Quantization Level
Mixtral 8x7B has 46.7B params but only 12.9B activate per token. You still need VRAM for all 46.7B. Exact VRAM for every quant from Q2 to FP16.
Qwen3 Complete Guide: Every Model from 0.6B to 235B
Qwen3 is the best open model family for budget local AI. Dense models from 0.6B to 32B, MoE models that punch above their weight, and a /think toggle no one else has.
Llama 4 Guide: Running Scout and Maverick Locally
Complete Llama 4 guide for local AI — Scout (109B MoE, 17B active) and Maverick (400B). VRAM requirements, Ollama setup, benchmarks, and honest hardware reality check.
GPT-OSS Guide: OpenAI's First Open Model for Local AI
GPT-OSS 20B is OpenAI's first open-weight model. MoE with 3.6B active params, MXFP4 at 13GB, 128K context, Apache 2.0. Here's how to run it.
DeepSeek V3.2 Guide: What Changed and How to Run It Locally
DeepSeek V3.2 competes with GPT-5 on benchmarks. The full model needs 350GB+ VRAM. But the R1 distills run on a $200 used GPU — and they're shockingly good.
Mixtral 8x7B & 8x22B VRAM Requirements
Mixtral 8x7B and 8x22B VRAM requirements at every quantization level. Exact numbers from Q2 to FP16, GPU recommendations, and KV cache impact explained.
Mistral & Mixtral Guide: Every Model Worth Running Locally
Nemo 12B with 128K context fits in 8GB VRAM at Q4. Mistral 7B runs on 4GB. Mixtral 8x7B needs 26-32GB and is now outpaced by Qwen 3. What's still worth running and what's been superseded.