What context length actually means for local LLMs, how it affects VRAM usage, practical limits for different hardware, and when you actually need 128K+ tokens.