1B-3B models run at 18-55 tok/s. Qwen 2.5 3B at Q4 is the sweet spot for chat and simple coding. 7B models don't fit. What works on GTX 1050 Ti and 1650, and when to upgrade.