Models
OpenClaw Model Combinations: What to Pair for Each Task
Stop running one model for everything in OpenClaw. Pair Qwen 2.5 Coder 32B for autocomplete, Qwen 3.5 27B for planning, and Qwen3-Coder-Next for agentic coding. Combos by VRAM tier.
The Benchmarks Lie: Why LLM Scores Don't Predict Real-World Performance
MMLU scores drop 14-17 points when contamination is removed. HumanEval is saturated at 94%. Models trained on the test set. Here's what to measure instead.
RWKV-7: Infinite Context, Zero KV Cache — The Local-First Architecture
RWKV-7 uses O(1) memory per token. Context length doesn't increase VRAM. At all. 16 tok/s on a Raspberry Pi. Here's why it matters for local AI and how to run it.
Model Routing for Local AI — Stop Using One Model for Everything
You're running one model for every task. That wastes VRAM, burns electricity, and gives worse results. Model routing sends each task to the right model at the right cost. Here's how to set it up.
Distilled vs Frontier Models for Local AI — What You're Actually Getting
That local model you love was probably trained on stolen outputs from Claude or GPT. Here's what distillation actually does to a model's reasoning, where it breaks, and why it matters most for agentic work.
Best Qwen 3.5 Setup: Which Model Fits Your GPU (Complete Cheat Sheet)
Pick the right Qwen 3.5 model for your hardware. Covers 0.8B through 397B with VRAM requirements, quant recommendations, and benchmarks for every GPU tier.
MoE Models Explained: Why Mixtral Uses 46B Parameters But Runs Like 13B
Mixture of Experts explained for local AI — why MoE models run fast but still need full VRAM. Mixtral, DeepSeek V3, DBRX compared with dense model alternatives.
Model Outputs Garbage: Debugging Bad Generations
Local LLM outputs repetitive loops, gibberish, or wrong answers? Seven causes with exact fixes — from corrupted downloads to wrong chat templates.
GGUF File Won't Load: Format and Compatibility Fixes
GGUF model won't load? Version mismatch, corrupted download, wrong format, split files, or memory issues. Find your error and fix it in under a minute.
Qwen3 Complete Guide: Every Model from 0.6B to 235B
Qwen3 is the best open model family for budget local AI. Dense models from 0.6B to 32B, MoE models that punch above their weight, and a /think toggle no one else has.
Llama 4 vs Qwen3 vs DeepSeek V3.2: Which to Run Locally in 2026
Llama 4 needs 55GB. DeepSeek V3.2 needs 350GB. Qwen3 runs on 8GB. Here's who wins at each VRAM tier and use case for local AI in 2026.
Llama 4 Guide: Running Scout and Maverick Locally
Complete Llama 4 guide for local AI — Scout (109B MoE, 17B active) and Maverick (400B). VRAM requirements, Ollama setup, benchmarks, and honest hardware reality check.
GPT-OSS Guide: OpenAI's First Open Model for Local AI
GPT-OSS 20B is OpenAI's first open-weight model. MoE with 3.6B active params, MXFP4 at 13GB, 128K context, Apache 2.0. Here's how to run it.
DeepSeek V3.2 Guide: What Changed and How to Run It Locally
DeepSeek V3.2 competes with GPT-5 on benchmarks. The full model needs 350GB+ VRAM. But the R1 distills run on a $200 used GPU — and they're shockingly good.
How to Update Models in Ollama — Keep Your Local LLMs Current
Ollama doesn't auto-update models. Run ollama pull model:tag to grab the latest version — only changed layers download. Use ollama show to check what you have, and a simple loop to update everything at once.
Best Uncensored Local LLMs (And Why You Might Want Them)
Dolphin 3.0, abliterated Llama 3.3, uncensored Qwen — the best unrestricted local models for fiction, research, and creative work. What uncensored actually means, which models to run, and the quality tradeoffs.
Best Local LLMs for Summarization
Qwen 2.5 14B is the summarization sweet spot — strong instruction following, 128K context for 200-page docs, fits on 16GB VRAM. Model picks by use case, quality ratings, chunking strategies, and prompting tips.
Phi Models Guide: Microsoft's Small but Mighty LLMs
Phi-4 14B scores 84.8% on MMLU — matching models 5x its size — and fits on a 12GB GPU at Q4. The full Phi lineup from 3.8B to 14B with VRAM needs, benchmarks, and honest weaknesses.
Managing Multiple Models in Ollama: Disk Space, Switching, and Organization
Five 7B models eat 20GB before you notice. Check what's using space with ollama list, clean up with ollama rm, and set OLLAMA_KEEP_ALIVE to control memory. A practical cleanup and organization guide.
Gemma Models Guide: Google's Lightweight Local LLMs
Gemma 3 27B beats Gemini 1.5 Pro on benchmarks and runs on a single GPU. The 4B outperforms Gemma 2 27B. Full lineup from 1B to 27B with VRAM needs, speeds, and honest comparisons.
Embedding Models for RAG: Which to Run Locally
nomic-embed-text is still the default for most local RAG setups — 274MB, 8K context, runs on CPU. But Qwen3-Embedding 0.6B just changed the game. Model picks, VRAM needs, speed numbers, and the chunking mistakes that break retrieval.
Best Local LLMs for Translation: What Actually Works
NLLB handles 200 languages on 3GB VRAM. Qwen 2.5 matches DeepL for European pairs. Opus-MT runs at 300MB per direction. Which local translation model fits your hardware and language needs.
Best Local LLMs for Data Analysis (2026)
Which local models write the best pandas and SQL code on your own hardware. Tested Qwen 2.5 Coder, DeepSeek, and Llama on real datasets with accuracy scores.
Best Local LLMs for Mac in 2026 — M1, M2, M3, M4 Tested
The best models to run on every Mac tier. Specific picks for 8GB M1 through 128GB M4 Max, with real tok/s numbers. MLX vs Ollama vs LM Studio compared.
Best Local Models for OpenClaw Agent Tasks
Qwen 3.5 27B on 24GB VRAM is the sweet spot for local agents — SWE-bench 72.4, 262K context, tool calling fixed in Ollama v0.17.6+. Model picks by VRAM tier and the 'society of minds' setup power users run.
Are Mistral Models Still Worth Running? Only Nemo 12B (Here's Why)
Mistral led local AI in 2024. In 2026, Qwen 3 and Llama 3 have passed them on most benchmarks. The exception: Mistral Nemo 12B with 128K context still earns its slot. What's worth running, what's been replaced, and when to pick Mistral over the competition.
Llama 3 Guide: Every Size from 1B to 405B
Complete Llama 3 guide covering every model from 1B to 405B. VRAM requirements, Ollama setup, benchmarks vs Qwen 3, and which size fits your hardware.
DeepSeek Models Guide: R1, V3, and Coder
Complete DeepSeek models guide covering R1, V3, and Coder locally. Which distilled R1 to pick for your GPU, VRAM requirements, and benchmarks vs Qwen 3.
Best Qwen Models Ranked: Which to Run Locally
Complete Qwen models guide covering Qwen 3.5, Qwen 3, Qwen 2.5 Coder, and Qwen-VL. VRAM requirements, Ollama setup, Gated DeltaNet architecture, and benchmarks vs Llama and DeepSeek.
Best Local LLMs for Math & Reasoning: What Actually Works
The best local LLMs for math and reasoning tasks, ranked by VRAM tier. AIME and MATH benchmarks for DeepSeek R1, Qwen 3 thinking, and Phi-4-reasoning.
Best Local LLMs for Chat & Conversation
The best local LLMs for chat and conversation in 2026. Picks for every VRAM tier from 8GB to 24GB, with Ollama commands to start chatting immediately.
Best Local LLMs for Writing & Creative Work
Qwen 2.5 32B on 24GB VRAM is the sweet spot for fiction and long-form. On 8GB, Nous Hermes 3 8B punches above its weight. Model picks for every tier and writing task.