MoE
Gemma 4 Just Dropped: What Local AI Builders Need to Know
Google's Gemma 4 is here -- dense and MoE variants, Apache 2.0, multimodal with vision and audio. VRAM requirements, benchmarks, and how it compares to Qwen 3.5.
DeepSeek V4: Everything We Know Before It Drops
DeepSeek V4 launches next week with native image and video generation, 1M context, and rumored 1T MoE params with only 32B active. Here's what local AI builders need to know and how to prepare.
Best Qwen 3.5 Models Ranked: Every Size, Every GPU, Every Quant
Complete ranking of all Qwen 3.5 models from 0.8B to 397B. VRAM requirements, speed benchmarks, and which model to pick for your hardware.
Qwen 3.5 Locally — 27B vs 35B-A3B vs 122B, Which Model Fits Your GPU
Qwen 3.5 27B dense vs 35B-A3B MoE vs 122B-A10B compared for local inference. VRAM tables, tok/s benchmarks on RTX 3090 and Mac, thinking mode setup, and which to pick for your hardware.
LiquidAI LFM2: The First Hybrid Model Built for Your Hardware
LFM2-24B-A2B runs at 112 tok/s on CPU with only 2.3B active params. Not a transformer. GGUF files from 13.5GB, Ollama and llama.cpp setup, and where it beats Qwen.
Best Way to Run Qwen 3.5 on Mac: MLX vs Ollama Speed Test
MLX runs Qwen 3.5 up to 2x faster than Ollama on Apple Silicon. Head-to-head benchmarks on M1 through M4, with setup instructions for both.
Best Qwen 3.5 Setup: Which Model Fits Your GPU (Complete Cheat Sheet)
Pick the right Qwen 3.5 model for your hardware. Covers 0.8B through 397B with VRAM requirements, quant recommendations, and benchmarks for every GPU tier.
MoE Models Explained: Why Mixtral Uses 46B Parameters But Runs Like 13B
Mixture of Experts explained for local AI — why MoE models run fast but still need full VRAM. Mixtral, DeepSeek V3, DBRX compared with dense model alternatives.
Mixtral VRAM Requirements: 8x7B and 8x22B at Every Quantization Level
Mixtral 8x7B has 46.7B params but only 12.9B activate per token. You still need VRAM for all 46.7B. Exact VRAM for every quant from Q2 to FP16.
Qwen3 Complete Guide: Every Model from 0.6B to 235B
Qwen3 is the best open model family for budget local AI. Dense models from 0.6B to 32B, MoE models that punch above their weight, and a /think toggle no one else has.
Llama 4 Guide: Running Scout and Maverick Locally
Complete Llama 4 guide for local AI — Scout (109B MoE, 17B active) and Maverick (400B). VRAM requirements, Ollama setup, benchmarks, and honest hardware reality check.
GPT-OSS Guide: OpenAI's First Open Model for Local AI
GPT-OSS 20B is OpenAI's first open-weight model. MoE with 3.6B active params, MXFP4 at 13GB, 128K context, Apache 2.0. Here's how to run it.
DeepSeek V3.2 Guide: What Changed and How to Run It Locally
DeepSeek V3.2 competes with GPT-5 on benchmarks. The full model needs 350GB+ VRAM. But the R1 distills run on a $200 used GPU — and they're shockingly good.
Mixtral 8x7B & 8x22B VRAM Requirements
Mixtral 8x7B and 8x22B VRAM requirements at every quantization level. Exact numbers from Q2 to FP16, GPU recommendations, and KV cache impact explained.
Are Mistral Models Still Worth Running? Only Nemo 12B (Here's Why)
Mistral led local AI in 2024. In 2026, Qwen 3 and Llama 3 have passed them on most benchmarks. The exception: Mistral Nemo 12B with 128K context still earns its slot. What's worth running, what's been replaced, and when to pick Mistral over the competition.