CUDA
ROCm vs CUDA for Local AI in 2026: The Software Gap Nobody Talks About
AMD GPUs have the bandwidth. They have the VRAM. They still lose by 2x on inference speed. Here's why, what actually works on ROCm 7.2, and whether RDNA 4 fixes anything.
Ubuntu 26.04 Is Built for Local AI — What Actually Changes
Ubuntu 26.04 LTS packages NVIDIA CUDA and AMD ROCm in official repos. No more external downloads or dependency nightmares. What's confirmed and what it means for local AI.
Ollama Not Using GPU: Complete Fix Guide
Ollama running on CPU instead of GPU? Diagnose with ollama ps and nvidia-smi, then fix CUDA drivers, ROCm setup, VRAM limits, and Docker GPU passthrough.
llama.cpp Build Errors: Common Fixes for Every Platform
llama.cpp won't build? CMake too old, CUDA not found, Metal not enabling, Visual Studio missing. Exact error messages and one-liner fixes for every platform.
CUDA Out of Memory: What It Means and How to Fix It
CUDA out of memory means your model doesn't fit in VRAM. Seven fixes ranked by effort — context length, KV cache quantization, model quant, CPU offload — with tool-specific commands for Ollama, llama.cpp, and LM Studio.
Local AI Troubleshooting Guide: Every Common Problem and Fix
Model running 30x slower than expected? Probably on CPU instead of GPU. Fixes for won't-load errors, CUDA crashes, garbled output, and OOM across Ollama and LM Studio.
Mac vs PC for Local AI: Which Should You Choose?
RTX 3090 runs 7B-14B models 2-3x faster than M4 Pro. M4 Max with 128GB loads 70B models a PC can't touch. Real benchmarks, prices, and which platform fits your use case.