RTX 3090
Best 24GB Backend Shootout: ik_llama vs BeeLlama vs llama.cpp
ik_llama and BeeLlama both finish in 22-23s on the am17an 9-prompt harness vs mainline llama.cpp's 37s — 1.66x and 1.62x speedups via opposite strategies.
Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026)
Firsthand bench: 60 tok/s on Qwen 3.6 27B Q4_K_M with MTP on a single RTX 3090 — 1.86x wall-clock speedup over baseline. PR #22673 progress May 6 → May 19.
Wicked Fast Gemma 4 vs Qwen 3.6 on RTX 3090: 3.10x Tested
Same RTX 3090, same llama.cpp build, same bench. Gemma 4 26B-A4B Q4_K_XL: 128 tok/s mean. Qwen 3.6-27B Q4_K_M: 41 tok/s. 3.10x faster, firsthand.
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.
How to Fix Slow Qwen 3.6 27B on RTX 3090 (10-80 tok/s)
Qwen 3.6-27B at 12 tok/s on a 3090 when others report 35? The 8-step diagnostic checklist for offload, quants, templates, power limits, and backend choice.
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 — below the 3.43x README claim, still huge. Here's how.
Best Way to Run Qwen 3.6 35B MoE Locally: VRAM, Speed, Setup
Qwen 3.6-35B-A3B has 35B total params but only 3B active per token. Real tok/s on RTX 3090, 4090, 5070 Ti, dual 5060 Ti, and M3 Ultra. Quants and setup.
Best Way to Get 2x Token Output on RTX 3090: Qwen 3.6 + DFlash
Luce DFlash + DDTree pushes Qwen 3.6-27B Q4_K_M from 35 tok/s to 69 tok/s on a single RTX 3090. Real benchmarks, setup, and honest limits.
RTX 4090 vs Used RTX 3090 for Local AI: Which to Buy in 2026
Both have 24GB VRAM. One costs 2-3x more. RTX 4090 vs used RTX 3090 — real benchmarks, real prices, and who should buy which for local LLM inference and image generation.
Best Dual-GPU Local AI Setup: RTX 3090, 5060 Ti (2026)
Dual RTX 3090, 2x RTX 5060 Ti, 2x 2080 Ti modded, mixed setups: real configs for Qwen 3.6, MoE, 70B. Tensor vs pipeline parallelism, llama.cpp/vLLM.
RTX 3090 vs 4070 Ti Super for Local LLMs
Head-to-head comparison of the RTX 3090 and RTX 4070 Ti Super for running LLMs locally. Covers VRAM, speed, power, price, and which to buy for your use case.
Best Used GPUs for Local AI: 2026 Buying Guide
RTX 3090 at ~$1,000 for 24GB, RTX 3060 12GB at $170-220, RTX 3080 at $350-400. Tier rankings, fair prices, what to avoid (skip the 8GB 3070), and where to buy safely.
Used GPU Buying Guide for Local AI: How to Buy Smart
RTX 3060 12GB for ~$200, RTX 3090 24GB for ~$750—used GPUs offer 2-3x the VRAM per dollar vs new. Fair prices, scam red flags, and where to buy safely.
What Can You Actually Run on 24GB VRAM?
Qwen 3.5 27B at Q4 fits in 17GB with 64K+ context. 70B at Q3 with limited context. Flux at full FP16. RTX 3090 at $700 vs 4090 at $1,800—every model that fits and which GPU to buy.
Used RTX 3090 Buying Guide for Local AI
24GB VRAM for ~$1,000—still the cheapest 24GB card on the market. eBay red flags, PSU requirements (850W minimum), and how to test before your return window closes.