Qwen 3.6
Ollama 0.30.0: What's New, What's Faster, What Breaks on Upgrade
Ollama 0.30.0: llama.cpp integration, flash-attention default for Qwen/Gemma, broader model support. Firsthand upgrade notes, known issues to watch.
MiniMax M3's asterisk, the Windows shift, and World's Fair plans
MiniMax M3 ships with frontier benchmarks but no downloadable weights yet. The Windows unified-memory hardware shift is coming for Apple Silicon's lead. And a personal note about who I'd like to see at AI Engineer World's Fair.
Qwen 3.6: Why Q4 Quant Breaks Local Coding Agents (And the Fix)
A viral thread says Q4-to-Q6 fixes Qwen 3.6 coding, but the test was confounded. What four independent reports show about the quant tax on coding agents.
Backend wars, Mac math, and the back-catalog refresh
Three speculative-decoding backends benched head to head on a single RTX 3090. The VRAM calculator finally caught up. And a 120-article audit found stale Qwen 2.5 recommendations.
Best 24GB Backend Shootout: ik_llama vs BeeLlama vs llama.cpp
ik_llama and BeeLlama both finish in 22-23s on the am17an 9-prompt harness vs mainline llama.cpp's 37s — 1.66x and 1.62x speedups via opposite strategies.
Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026)
Firsthand bench: 60 tok/s on Qwen 3.6 27B Q4_K_M with MTP on a single RTX 3090 — 1.86x wall-clock speedup over baseline. PR #22673 progress May 6 → May 19.
Power week in local AI: Mythos, MiroThinker, real Qwen 3.6 builds
Two researchers cracked Apple's flagship defense in a week. An open-source agent beat closed-source on real benchmarks. Multi-GPU stopped being theoretical.
Wicked Fast Gemma 4 vs Qwen 3.6 on RTX 3090: 3.10x Tested
Same RTX 3090, same llama.cpp build, same bench. Gemma 4 26B-A4B Q4_K_XL: 128 tok/s mean. Qwen 3.6-27B Q4_K_M: 41 tok/s. 3.10x faster, firsthand.
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 — below the 3.43x README claim, still huge. Here's how.
This Week in Local AI — DeepSeek V4 Took #1 on Vibe Code
DeepSeek V4-Flash hit #1 on Vibe Code Benchmark. Qwen 3.6 dropped both variants. FP4 landed in llama.cpp. Anthropic admitted they quietly downgraded Claude Code on March 4.
Qwen 3.6 Complete Guide: 27B Dense, 35B-A3B MoE, and Which to Use
Qwen 3.6 landed in two open-weight flavors: 27B dense and 35B-A3B MoE. Benchmarks, hardware fit, and which variant to run on your GPU.
Best 8GB GPU Model: How to Set Up Qwen 3.5 9B (Step by Step)
Qwen 3.5 9B fits in 6.6GB and beats Qwen 3-class models 3x its size. Setup on Ollama/llama.cpp, quant table, where 9B still fits in the May 2026 lineup.
Best Local Models for PI Agent: Qwen 3.6, Gemma 4 (2026 Setup)
PI Agent runs any model locally via Ollama. May 2026 picks: Qwen 3.6 27B / 35B-A3B MoE, Gemma 4 26B-A4B. Setup, model comparisons, honest limits.
Qwen 3.5 Locally — 27B vs 35B-A3B vs 122B, Which Model Fits Your GPU
Qwen 3.5 and 3.6 on local hardware. 27B dense vs 35B-A3B MoE vs 122B compared. VRAM tables, community tok/s on RTX 3090, and which to pick for your card.
llama.cpp Build Errors: Common Fixes for Every Platform
llama.cpp won't build or runs wrong? CMake, CUDA, Gemma 4 thinking-mode, Qwen 3.6 kwargs, num_ctx VRAM overflow. Exact fixes for every platform.
Best Ways to Fix OpenClaw Tool Call Failures: 2026 Guide
Your OpenClaw agent silently fails, loops, or corrupts its session. Six debug paths plus May 2026 gotchas: Qwen 3.6 whitespace kwargs, Gemma 4 thinking mode.
Best Local LLMs for Function Calling: Qwen 3.6, Gemma 4
Function calling with local LLMs on Ollama and llama.cpp. Current lineup: Qwen 3.6, Gemma 4, DeepSeek V4. Common failures, agentic loop patterns. May 2026.
Best Local LLMs for Structured Output: Qwen 3.6, Gemma 4
JSON schema, grammar constraints, and Outlines compared. Current model picks: Qwen 3.6, Gemma 4, DeepSeek V4. Common failures + working code. May 2026.
Best Ways to Manage Multiple Ollama Models: 2026 Workflows
Manage multiple Ollama models in 2026: disk cleanup, switching, tagging. Qwen 3.6, Gemma 4, DeepSeek V4 (cloud-only) — practical workflows.
Best Vision Models You Can Run Locally: Every Model, Every GPU Tier
Qwen 3.6 and Gemma 4 are the new local vision SOTA picks. Full VRAM table, Ollama commands, setup for every GPU from 4GB to 48GB+. Updated May 2026.
Best Dual-GPU Local AI Setup: RTX 3090, 5060 Ti (2026)
Dual RTX 3090, 2x RTX 5060 Ti, 2x 2080 Ti modded, mixed setups: real configs for Qwen 3.6, MoE, 70B. Tensor vs pipeline parallelism, llama.cpp/vLLM.
Best Local LLMs for Mac in 2026 — M1 through M5 Tested
The best models to run on every Mac tier. Specific picks for 8GB M1 through 192GB M3 Ultra, with real tok/s numbers. Qwen 3.6, Llama 4 Scout, DeepSeek V4, MLX vs Ollama, updated May 2026.
Best Local Models for OpenClaw 2026: Qwen 3.6 + DeepSeek V4
Qwen 3.6-27B dense ties Sonnet 4.6 on agentic coding; 3.6-35B-A3B runs OpenClaw on 16GB VRAM. Plus DeepSeek V4-Flash, sampling tips, VRAM tiers.
Best Qwen Models Ranked: Which to Run Locally (May 2026)
Complete Qwen models guide covering Qwen 3.6, Qwen 3.5, Qwen 3, Qwen3-Coder-Next, and Qwen-VL. VRAM requirements, Ollama setup, hybrid Gated DeltaNet architecture, and benchmarks vs Llama 4 and DeepSeek V4.
Ollama Troubleshooting Guide: Every Common Problem and Fix
GPU not detected? Running at 1/30th speed on CPU? OOM crashes mid-generation? Every common Ollama error with exact diagnostic commands and fixes for Mac, Windows, and Linux. Updated June 2026 for v0.30.0 and Qwen 3.5 + 3.6.
Best Local Coding Models Ranked: Every VRAM Tier, Every Benchmark (2026)
The best local LLMs for coding in 2026, ranked by VRAM tier. Qwen 3.6-27B, 3.6-35B-A3B, DeepSeek V4-Flash, benchmarks, editor setup, and Claude Code alternatives.