LLM Inference
Flash-MoE: Run a 397B Model on a 48GB Laptop (Here's How)
Flash-MoE streams Qwen3.5-397B from your SSD at 4.4 tok/s using 5.5GB of RAM. Pure C and Metal, no Python. Here's what's real, what's hype, and how to try it.
LocalAgent: A Local-First Agent Runtime That Actually Cares About Safety
Rust CLI for AI agents with deny-by-default permissions, approval workflows, and deterministic replay. Works with LM Studio, Ollama, and llama.cpp.
Multi-GPU Local AI: Run Models Across Multiple GPUs
Dual RTX 3090s give you 48GB VRAM and run 70B models at 16-21 tok/s—vs 1 tok/s with CPU offloading. Tensor vs pipeline parallelism, setup guides, and real scaling numbers.