VLLM

DeepSeek V4 Flash vs Pro: What Actually Dropped and How to Run It
DeepSeek V4 preview dropped April 23 with two MoE variants: Pro at 1.6T/49B active and Flash at 284B/13B active. Both MIT, both 1M context. Flash is the news.
Apr 24, 2026
Razer AIKit Guide: Multi-GPU Local AI on Your Desktop
Open-source Docker stack bundling vLLM, Ray, LlamaFactory, and Grafana into 1 container. Auto-detects GPUs, supports 280K+ HuggingFace models, and handles multi-GPU parallelism.
Feb 6, 2026
Best Dual-GPU Local AI Setup: RTX 3090, 5060 Ti (2026)
Dual RTX 3090, 2x RTX 5060 Ti, 2x 2080 Ti modded, mixed setups: real configs for Qwen 3.6, MoE, 70B. Tensor vs pipeline parallelism, llama.cpp/vLLM.
Feb 6, 2026
llama.cpp vs Ollama vs vLLM: One User vs Many (2026)
Single-user, the three are closer than benchmark posts admit. Concurrent, vLLM pulls 10-20x ahead. Decision tree, the vLLM VRAM gotcha, June 2026 versions.
Feb 3, 2026