BeeLlama
Backend wars, Mac math, and the back-catalog refresh
Three speculative-decoding backends benched head to head on a single RTX 3090. The VRAM calculator finally caught up. And a 120-article audit found stale Qwen 2.5 recommendations.
Best 24GB Backend Shootout: ik_llama vs BeeLlama vs llama.cpp
ik_llama and BeeLlama both finish in 22-23s on the am17an 9-prompt harness vs mainline llama.cpp's 37s — 1.66x and 1.62x speedups via opposite strategies.