Speculative-Decoding
Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026)
Firsthand bench: 60 tok/s on Qwen 3.6 27B Q4_K_M with MTP on a single RTX 3090 — 1.86x wall-clock speedup over baseline. PR #22673 progress May 6 → May 19.
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.