Benchmarks

RTX 5090 vs DGX Spark vs AMD: The Ultimate Local LLM Benchmark (2026)
Real llama.cpp benchmarks across RTX 5090, DGX Spark, and AMD AI395 with ROCm and Vulkan. Token speeds, VRAM usage, and which hardware wins for local AI.
Mar 25, 2026
LM Studio vs llama.cpp: Why Your Model Runs Slower in the GUI
LM Studio uses llama.cpp under the hood but often runs 30-50% slower. Bundled runtime lag, UI overhead, and default settings explain the gap. How to benchmark it yourself and when the convenience is worth it.
Mar 5, 2026
RTX 5060 Ti Review for Local AI — The New Budget King
Real benchmarks for the RTX 5060 Ti 16GB running local LLMs. Qwen 3.5 35B at 44 tok/s, 100K context for ~$430. Compared against RTX 3060, 3090, and 4060 Ti.
Feb 28, 2026
The Benchmarks Lie: Why LLM Scores Don't Predict Real-World Performance
MMLU scores drop 14-17 points when contamination is removed. HumanEval is saturated at 94%. Models trained on the test set. Here's what to measure instead.
Feb 25, 2026
Distilled vs Frontier Models for Local AI — What You're Actually Getting
That local model you love was probably trained on stolen outputs from Claude or GPT. Here's what distillation actually does to a model's reasoning, where it breaks, and why it matters most for agentic work.
Feb 25, 2026
Best Qwen 3.5 Setup: Which Model Fits Your GPU (Complete Cheat Sheet)
Pick the right Qwen 3.5 model for your hardware. Covers 0.8B through 397B with VRAM requirements, quant recommendations, and benchmarks for every GPU tier.
Feb 25, 2026
Best LLM Speed Trick: ExLlamaV2 vs llama.cpp Benchmarks (50-85% Faster)
Head-to-head speed benchmarks on RTX 3090 and 4090. ExLlamaV2 generates tokens 50-85% faster than llama.cpp on NVIDIA GPUs. Full comparison with setup guides for both.
Feb 14, 2026