PR-22673
Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026)
Firsthand bench: 60 tok/s on Qwen 3.6 27B Q4_K_M with MTP on a single RTX 3090 โ 1.86x wall-clock speedup over baseline. PR #22673 progress May 6 โ May 19.
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 โ below the 3.43x README claim, still huge. Here's how.