Benchmarks
Best 24GB Backend Shootout: ik_llama vs BeeLlama vs llama.cpp
ik_llama and BeeLlama both finish in 22-23s on the am17an 9-prompt harness vs mainline llama.cpp's 37s — 1.66x and 1.62x speedups via opposite strategies.
Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026)
Firsthand bench: 60 tok/s on Qwen 3.6 27B Q4_K_M with MTP on a single RTX 3090 — 1.86x wall-clock speedup over baseline. PR #22673 progress May 6 → May 19.
Wicked Fast Gemma 4 vs Qwen 3.6 on RTX 3090: 3.10x Tested
Same RTX 3090, same llama.cpp build, same bench. Gemma 4 26B-A4B Q4_K_XL: 128 tok/s mean. Qwen 3.6-27B Q4_K_M: 41 tok/s. 3.10x faster, firsthand.
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 — below the 3.43x README claim, still huge. Here's how.