Troubleshooting
LLM Running Slow? Two Different Problems, Two Different Fixes
Slow local LLM? Separate time-to-first-token from generation speed. Fix prompt processing with batch size and Flash Attention. Fix tok/s with GPU layers, quantization, and context length.
Why Your Local LLM Is Slow: The num_ctx VRAM Overflow Nobody Warns You About
DeepSeek-R1 14B went from 35 tok/s to 4.8 tok/s on the same GPU. The fix was one parameter. How num_ctx silently overflows VRAM and kills inference speed.
Qwen2.5-VL Not Loading in LM Studio? Fix mmproj and Vision Errors
Fix every Qwen2.5-VL error in LM Studio: missing mmproj, 'model type not supported', no eye icon, vision crashes. Exact fixes with file paths.
Open WebUI Not Connecting to Ollama? Every Fix
Docker networking, wrong OLLAMA_BASE_URL, localhost confusion, WSL2 isolation, missing models, random disconnects. Every Open WebUI + Ollama connection problem with the exact fix.
Ollama on Mac Not Working? Fix Metal, Memory Pressure, and Slow Performance
ollama ps says CPU? Generation crawling at 2 tok/s? macOS killed your model mid-sentence? Every Mac-specific Ollama problem diagnosed and fixed with exact commands.
Why Is My Local LLM So Slow? A Diagnostic Guide
Local LLM running slow? Check GPU vs CPU inference, VRAM offloading, quantization, context length, backend choice, and thermals. Find your fix in 60 seconds.
ROCm Not Detecting GPU: AMD Troubleshooting Guide
AMD GPU not detected in ROCm? Check supported GPUs, fix rocminfo errors, HSA_OVERRIDE hack for unsupported cards, and Ollama/llama.cpp ROCm build fixes.
Ollama Not Using GPU: Complete Fix Guide
Ollama running on CPU instead of GPU? Diagnose with ollama ps and nvidia-smi, then fix CUDA drivers, ROCm setup, VRAM limits, and Docker GPU passthrough.
Ollama API Connection Refused: Quick Fixes
Ollama API returning connection refused? Check if it's running, fix the port, open it to the network, and solve Docker and WSL2 connectivity issues.
Model Outputs Garbage: Debugging Bad Generations
Local LLM outputs repetitive loops, gibberish, or wrong answers? Seven causes with exact fixes — from corrupted downloads to wrong chat templates.
Memory Leak in Long Conversations: Causes and Fixes
VRAM climbs with every message until your model crashes? It's probably KV cache growth, not a leak. How to diagnose, monitor, and fix memory issues in local LLMs.
llama.cpp Build Errors: Common Fixes for Every Platform
llama.cpp won't build? CMake too old, CUDA not found, Metal not enabling, Visual Studio missing. Exact error messages and one-liner fixes for every platform.
GGUF File Won't Load: Format and Compatibility Fixes
GGUF model won't load? Version mismatch, corrupted download, wrong format, split files, or memory issues. Find your error and fix it in under a minute.
CUDA Out of Memory: What It Means and How to Fix It
CUDA out of memory means your model doesn't fit in VRAM. Seven fixes ranked by effort — context length, KV cache quantization, model quant, CPU offload — with tool-specific commands for Ollama, llama.cpp, and LM Studio.
Context Length Exceeded: What To Do When Your Model Runs Out of Space
Model forgetting earlier messages or throwing context errors? How context length works, what happens when it fills, and practical fixes for chat, RAG, and coding.
OpenClaw Tool Call Failures: Why Models Break and How to Fix Them
Your OpenClaw agent silently fails, loops forever, or corrupts its session. Here's why tool calls break and how to fix each failure mode.
OpenClaw Memory Problems: Context Rot and the Forgetting Fix
Your OpenClaw agent forgets instructions, repeats questions, and contradicts itself in long sessions. Here's how its memory works and how to fix it.
Local AI Troubleshooting Guide: Every Common Problem and Fix
Model running 30x slower than expected? Probably on CPU instead of GPU. Fixes for won't-load errors, CUDA crashes, garbled output, and OOM across Ollama and LM Studio.
Ollama Troubleshooting Guide: Every Common Problem and Fix
GPU not detected? Running at 1/30th speed on CPU? OOM crashes mid-generation? Every common Ollama error with exact diagnostic commands and fixes for Mac, Windows, and Linux. Updated for v0.17.7 and Qwen 3.5.