Software

Text Generation WebUI Setup Guide (2026)
Install oobabooga's TextGen (formerly text-generation-webui), load GGUF/EXL2/EXL3 models, and configure GPU offloading. Now with vision, tool-calling, and an Anthropic-compatible API. Covers the settings most guides skip.
Jul 10, 2026
How to Fix Slow Qwen 3.6 27B on RTX 3090 (10-80 tok/s)
Qwen 3.6-27B at 12 tok/s on a 3090 when others report 35? The 8-step diagnostic checklist for offload, quants, templates, power limits, and backend choice.
May 1, 2026
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 — below the 3.43x README claim, still huge. Here's how.
Apr 30, 2026
Best Way to Get 2x Token Output on RTX 3090: Qwen 3.6 + DFlash
Luce DFlash + DDTree pushes Qwen 3.6-27B Q4_K_M from 35 tok/s to 69 tok/s on a single RTX 3090. Real benchmarks, setup, and honest limits.
Apr 27, 2026
FP4 Just Landed in llama.cpp: NVFP4 vs MXFP4 Explained (2026)
NVFP4 in llama.cpp, MXFP4 in ik_llama.cpp. The first practical FP4 quantization for the GGUF ecosystem — what works, what doesn't, and what to test.
Apr 25, 2026
Home Assistant Voice Control, No Cloud or Alexa (2026)
Fully local voice control with Home Assistant, Ollama, Whisper, and Piper — no Alexa, no cloud, no subscriptions. Wyoming pipeline, model picks, hardware.
Mar 6, 2026
LM Studio vs llama.cpp: Why Your Model Runs Slower in the GUI
LM Studio uses llama.cpp under the hood but often runs 30-50% slower. Bundled runtime lag, UI overhead, and default settings explain the gap. How to benchmark it yourself and when the convenience is worth it.
Mar 5, 2026
Best Docker Setup for Local AI: Ollama + Open WebUI (2026)
Five copy-paste Docker compose recipes for Ollama 0.24 + Open WebUI: NVIDIA Blackwell, AMD ROCm, multi-GPU, and CPU. Plus the Apple Silicon catch most guides skip.
Mar 4, 2026
WSL2 + Ollama on Windows: Complete Setup Guide (GPU Passthrough Included)
Install Ollama in WSL2 with full GPU acceleration in 20 minutes. GPU passthrough, Open WebUI, Docker Compose, VPN fixes, and the gotchas that will waste your afternoon.
Mar 1, 2026
Ollama on Mac: Setup and Optimization Guide (2026)
Install Ollama on Apple Silicon, verify Metal GPU is active, and tune it for your Mac's RAM. Config for M1 through M4 Ultra with model picks per memory tier.
Feb 26, 2026
LM Studio vs Ollama on Mac: Which Should You Use?
LM Studio's MLX backend is 20-30% faster and uses half the memory. Ollama is lighter, always-on, and better for APIs. Mac-specific benchmarks and when to use each.
Feb 26, 2026
Local LLMs vs ChatGPT: An Honest Comparison
ChatGPT has web search, voice mode, and GPT-5.2. Local LLMs have privacy, no subscriptions, and no rate limits. Here's when each one wins, what the cost math actually looks like, and why most power users run both.
Feb 24, 2026
WSL2 Local AI on Windows: GPU Passthrough, Fixed (2026)
Install WSL2, configure GPU passthrough, set up Ollama and llama.cpp with CUDA, and optimize memory for LLM inference. Step-by-step for Windows 11.
Feb 23, 2026
Best New Ollama 0.17 Features: ollama launch, MLX, and OpenClaw Support
Everything new in Ollama 0.16 through 0.17.7: ollama launch for coding tools, native MLX on Apple Silicon, OpenClaw integration, web search API, and image generation. Updated March 2026.
Feb 23, 2026
llama.cpp Just Got a New Home: What the Hugging Face Acquisition Means for Local AI
ggml.ai — the team behind llama.cpp — is joining Hugging Face. Open source stays open, Georgi keeps the wheel. What changed, what didn't, and what to watch.
Feb 20, 2026
Run Qwen2.5-VL Vision in LM Studio (Setup)
Get Qwen2.5-VL running in LM Studio in 5 minutes. Covers the mmproj file most people miss, correct download links, and how to analyze images and PDFs locally.
Feb 14, 2026
How to Update Models in Ollama — Keep Your Local LLMs Current
Ollama doesn't auto-update models. Run ollama pull model:tag to grab the latest version — only changed layers download. Use ollama show to check what you have, and a simple loop to update everything at once.
Feb 14, 2026
Best LLM Speed Trick: ExLlamaV2 vs llama.cpp Benchmarks (50-85% Faster)
Head-to-head speed benchmarks on RTX 3090 and 4090. ExLlamaV2 generates tokens 50-85% faster than llama.cpp on NVIDIA GPUs. Full comparison with setup guides for both.
Feb 14, 2026
Best Ways to Manage Multiple Ollama Models: 2026 Workflows
Manage multiple Ollama models in 2026: disk cleanup, switching, tagging. Qwen 3.6, Gemma 4, DeepSeek V4 (cloud-only) — practical workflows.
Feb 8, 2026
AnythingLLM Setup Guide: Chat With Your Documents Locally
Upload PDFs, paste URLs, and chat with your files — no coding, no cloud. AnythingLLM connects to Ollama in 5 minutes with point-and-click RAG on 54K+ GitHub stars.
Feb 8, 2026
Local LLMs vs Claude: When Each Actually Wins
Qwen 3 32B matches Claude on daily tasks at zero marginal cost. Claude still wins on 200K-token documents and multi-step debugging. Benchmarks, pricing, and when to use each.
Feb 3, 2026
llama.cpp vs Ollama vs vLLM: One User vs Many (2026)
Single-user, the three are closer than benchmark posts admit. Concurrent, vLLM pulls 10-20x ahead. Decision tree, the vLLM VRAM gotcha, mid-2026 versions.
Feb 3, 2026
Open WebUI Setup Guide: ChatGPT UI for Local AI
1 Docker command gives you a ChatGPT-like interface for any Ollama model. 120K+ GitHub stars, built-in RAG, voice chat, and multi-model switching—all running locally.
Feb 2, 2026
LM Studio Tips & Tricks: Faster + Features You Miss (2026)
Run local LLMs faster in LM Studio with speculative decoding and MLX, plus the API server, GPU offload, and power-user features most people never touch.
Feb 1, 2026
Ollama vs LM Studio: Speed, Setup, and Verdict
Ollama gives you a CLI with 100+ models and an OpenAI-compatible API. LM Studio gives you a visual GUI with one-click downloads. Most power users run both—here's when to use each.
Jan 27, 2026
Run Your First Local LLM in 15 Minutes
Install Ollama, pull a model, and chat with AI offline—all in 15 minutes. Works on any Mac, Windows, or Linux machine with 8GB RAM. No accounts, no API keys, no fees.
Jan 27, 2026