Llama
nanollama: Train Your Own Llama 3 From Scratch on Custom Data
Pretrain Llama 3 architecture models from raw text, export to GGUF, and run with llama.cpp. Forked from Karpathy's nanochat. 46M to 7B parameters.
Qwen vs Llama vs Mistral: Which Model Family Should You Build On?
Qwen has 201 languages and a model for every task. Llama has the biggest community. Mistral pioneered efficient MoE. Decision framework for choosing your model family in 2026.
Running 70B Models Locally — Exact VRAM by Quantization
Llama 3.3 70B needs 43GB at Q4, 75GB at Q8, 141GB at FP16. Here's every quant level, which GPUs fit, real speeds, and when 32B is the smarter choice.
Local LLMs vs Claude: When Each Actually Wins
Qwen 3 32B matches Claude on daily tasks at zero marginal cost. Claude still wins on 200K-token documents and multi-step debugging. Benchmarks, pricing, and when to use each.