Turboquant

TurboQuant Explained: How Google's KV Cache Trick Cuts Memory 6x With Zero Quality Loss
Google's TurboQuant compresses the KV cache 6x with zero accuracy loss. Here's what it actually does, how it works in llama.cpp and MLX, and what it means for running bigger models on your GPU.
Mar 30, 2026