Num_ctx

Why Your Local LLM Is Slow: The num_ctx VRAM Overflow Nobody Warns You About
DeepSeek-R1 14B went from 35 tok/s to 4.8 tok/s on the same GPU. The fix was one parameter. How num_ctx silently overflows VRAM and kills inference speed.
Mar 3, 2026