Dual RTX 3090s give you 48GB VRAM and run 70B models at 16-21 tok/s—vs 1 tok/s with CPU offloading. Tensor vs pipeline parallelism, setup guides, and real scaling numbers.