Local LLM

How to Run GLM 5.2 Locally: GPU, VRAM & Quant Guide
GLM 5.2 is 753B params and 1.51TB at full precision. Run it locally: the live Unsloth quant ladder, every GPU and RAM path, and the quant to actually target.
Jun 21, 2026
How to Fix Slow Qwen 3.6 27B on RTX 3090 (10-80 tok/s)
Qwen 3.6-27B at 12 tok/s on a 3090 when others report 35? The 8-step diagnostic checklist for offload, quants, templates, power limits, and backend choice.
May 1, 2026
Best Way to Run Qwen 3.6 35B MoE Locally: VRAM, Speed, Setup
Qwen 3.6-35B-A3B has 35B total params but only 3B active per token. Real tok/s on RTX 3090, 4090, 5070 Ti, dual 5060 Ti, and M3 Ultra. Quants and setup.
Apr 28, 2026
Best Local LLMs for Summarization
Qwen 2.5 14B is the summarization sweet spot — strong instruction following, 128K context for 200-page docs, fits on 16GB VRAM. Model picks by use case, quality ratings, chunking strategies, and prompting tips.
Feb 10, 2026
Best Local LLMs for RAG in 2026
The best local models for retrieval-augmented generation by VRAM tier. Qwen 3, Command R 35B, embedding models, and RAG stacks with real failure modes.
Feb 6, 2026
Run Your First Local LLM in 15 Minutes
Install Ollama, pull a model, and chat with AI offline—all in 15 minutes. Works on any Mac, Windows, or Linux machine with 8GB RAM. No accounts, no API keys, no fees.
Jan 27, 2026