Embedding Models for RAG: Which to Run Locally
nomic-embed-text is still the default for most local RAG setups — 274MB, 8K context, runs on CPU. But Qwen3-Embedding 0.6B just changed the game. Model picks, VRAM needs, speed numbers, and the chunking mistakes that break retrieval.
Feb 8, 2026 · 14 min read