RTX 3090

Qwen 3.7's open weights are overdue — by the math, not vibes
Qwen's own release cadence says the 3.7 open weights should already be out, and they're not. Plus GLM-5.2 running locally: a frontier open model that takes serious hardware.
Jun 22, 2026
Best 24GB Backend Shootout: ik_llama vs BeeLlama vs llama.cpp
ik_llama and BeeLlama both finish in 22-23s on the am17an 9-prompt harness vs mainline llama.cpp's 37s — 1.66x and 1.62x speedups via opposite strategies.
May 22, 2026
Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026)
Firsthand bench: 60 tok/s on Qwen 3.6 27B Q4_K_M with MTP on a single RTX 3090 — 1.86x wall-clock speedup over baseline. PR #22673 progress May 6 → May 19.
May 19, 2026
Wicked Fast Gemma 4 vs Qwen 3.6 on RTX 3090: 3.10x Tested
Same RTX 3090, same llama.cpp build, same bench. Gemma 4 26B-A4B Q4_K_XL: 128 tok/s mean. Qwen 3.6-27B Q4_K_M: 41 tok/s. 3.10x faster, firsthand.
May 8, 2026
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.
May 6, 2026
How to Fix Slow Qwen 3.6 27B on RTX 3090 (10-80 tok/s)
Qwen 3.6-27B at 12 tok/s on a 3090 when others report 35? The 8-step diagnostic checklist for offload, quants, templates, power limits, and backend choice.
May 1, 2026
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 — below the 3.43x README claim, still huge. Here's how.
Apr 30, 2026
Best Way to Run Qwen 3.6 35B MoE Locally: VRAM, Speed, Setup
Qwen 3.6-35B-A3B has 35B total params but only 3B active per token. Real tok/s on RTX 3090, 4090, 5070 Ti, dual 5060 Ti, and M3 Ultra. Quants and setup.
Apr 28, 2026
Best Way to Get 2x Token Output on RTX 3090: Qwen 3.6 + DFlash
Luce DFlash + DDTree pushes Qwen 3.6-27B Q4_K_M from 35 tok/s to 69 tok/s on a single RTX 3090. Real benchmarks, setup, and honest limits.
Apr 27, 2026
RTX 5090 Benchmarks: 5090 vs 4090 vs Used 3090 (2026)
5090 community benches across 4K-131K context, prompt-processing tables, 5090-vs-4090 upgrade math, and InsiderLLM's firsthand 3090 honest-value anchor.
Mar 25, 2026
RTX 4090 vs Used RTX 3090 for Local AI: Which to Buy in 2026
Both have 24GB VRAM. One costs 2-3x more. RTX 4090 vs used RTX 3090 — real benchmarks, real prices, and who should buy which for local LLM inference and image generation.
Feb 21, 2026
Best Dual-GPU Local AI Setup: RTX 3090, 5060 Ti (2026)
Dual RTX 3090, 2x RTX 5060 Ti, 2x 2080 Ti modded, mixed setups: real configs for Qwen 3.6, MoE, 70B. Tensor vs pipeline parallelism, llama.cpp/vLLM.
Feb 6, 2026
RTX 3090 vs 4070 Ti Super for Local LLMs
Head-to-head comparison of the RTX 3090 and RTX 4070 Ti Super for running LLMs locally. Covers VRAM, speed, power, price, and which to buy for your use case.
Feb 4, 2026
Best Used GPUs for Local AI: 2026 Buying Guide
RTX 3090 at ~$1,000 for 24GB, RTX 3060 12GB at $170-220, RTX 3080 at $350-400. Tier rankings, fair prices, what to avoid (skip the 8GB 3070), and where to buy safely.
Feb 4, 2026
Used GPU Buying Guide for Local AI: How to Buy Smart
RTX 3060 12GB for ~$200, RTX 3090 24GB for ~$750—used GPUs offer 2-3x the VRAM per dollar vs new. Fair prices, scam red flags, and where to buy safely.
Jan 30, 2026
What Can You Actually Run on 24GB VRAM?
Qwen 3.5 27B at Q4 fits in 17GB with 64K+ context. 70B at Q3 with limited context. Flux at full FP16. RTX 3090 at $700 vs 4090 at $1,800—every model that fits and which GPU to buy.
Jan 29, 2026
Used RTX 3090 Buying Guide for Local AI
24GB VRAM for ~$1,000—still the cheapest 24GB card on the market. eBay red flags, PSU requirements (850W minimum), and how to test before your return window closes.
Jan 27, 2026