Llama

nanollama: Train Your Own Llama 3 From Scratch on Custom Data
Pretrain Llama 3 architecture models from raw text, export to GGUF, and run with llama.cpp. Forked from Karpathy's nanochat. 46M to 7B parameters.
Feb 23, 2026
Qwen vs Llama vs Mistral: Which Model Family Should You Build On?
Qwen has 201 languages and a model for every task. Llama has the biggest community. Mistral pioneered efficient MoE. Decision framework for choosing your model family in 2026.
Feb 21, 2026
Running 70B Models Locally — Exact VRAM by Quantization
Llama 3.3 70B needs 43GB at Q4, 75GB at Q8, 141GB at FP16. Here's every quant level, which GPUs fit, real speeds, and when 32B is the smarter choice.
Feb 14, 2026
Local LLMs vs Claude: When Each Actually Wins
Qwen 3 32B matches Claude on daily tasks at zero marginal cost. Claude still wins on 200K-token documents and multi-step debugging. Benchmarks, pricing, and when to use each.
Feb 3, 2026