DFlash

Backend wars, Mac math, and the back-catalog refresh
Three speculative-decoding backends benched head to head on a single RTX 3090. The VRAM calculator finally caught up. And a 120-article audit found stale Qwen 2.5 recommendations.
May 25, 2026
Best 24GB Backend Shootout: ik_llama vs BeeLlama vs llama.cpp
ik_llama and BeeLlama both finish in 22-23s on the am17an 9-prompt harness vs mainline llama.cpp's 37s — 1.66x and 1.62x speedups via opposite strategies.
May 22, 2026
DFlash vs MTP on RTX 3090: I Tested Both Locally
Firsthand head-to-head bench of DFlash + DDTree against MTP (PR #22673) on a single RTX 3090, same Qwen 3.6-27B target. Real numbers, both backends.
May 6, 2026
This Week in Local AI — I Built DFlash and Audited Lightning
I built DFlash from source on a real RTX 3090 and benched both Qwens. Then audited my stack after PyPI's `lightning` package shipped malware that abuses Claude Code hooks.
May 3, 2026
How to Get 2.5x Faster Qwen on RTX 3090 (Free)
I built DFlash on my RTX 3090 and ran the full bench. Real 2.5x speedup on Qwen 3.5 and 3.6 — below the 3.43x README claim, still huge. Here's how.
Apr 30, 2026