GB10 Boxes Compared: DGX Spark vs Dell vs ASUS vs MSI

📚 More on this topic: GPU Buying Guide · VRAM Requirements · Multi-GPU Local AI · What Can You Run on 24GB VRAM

Four companies are selling boxes with the same chip inside. NVIDIA’s DGX Spark, Dell’s Pro Max GB10, ASUS’s Ascent GX10, and MSI’s EdgeXpert all use the NVIDIA Grace Blackwell GB10 superchip — 20 ARM cores, a Blackwell GPU with 6,144 CUDA cores, and 128GB of unified LPDDR5X memory. Same silicon, same 1 PFLOP sparse FP4 compute, same DGXOS.

The pitch is compelling: a petaflop AI computer on your desk for $3,000-4,000. Load 70B models unquantized. Run 200B models in FP4. No multi-GPU complexity, no 850W power supply, no rack mount.

But “same chip” doesn’t mean same machine. The chassis matters for thermals. The NVMe generation matters for model loading. The build quality matters if you’re spending $4,000. And most importantly — should you spend $3,000-4,000 on any of these when a used RTX 3090 costs $900?

Here’s what 45-minute heat soak tests, real inference benchmarks, and side-by-side comparisons actually show.

The Comparison Table

	DGX Spark	Dell Pro Max	ASUS GX10	MSI EdgeXpert
Price (1TB)	N/A	N/A	$3,099	$2,999
Price (2TB)	N/A	$3,699	~$3,200	N/A
Price (4TB)	$3,999	$3,999	$4,150	$3,999
NVMe gen	Gen 5	Gen 4	Gen 4	Gen 4
SSD size	2242	2242	2242	2242
SSD upgradeable	Yes	Yes	Yes (harder)	Yes
Weight	1,255g	1,256g	1,474g	1,257g
Back panel	Magnetic	Magnetic (6 magnets)	Screws	Screws
Port labeling	None	Minimal	Good	Best
Front power button	No	No	Yes	No
Chassis	Metal	Metal	Metal + plastic top	Mostly plastic
Noise	Quietest	Very quiet	Quiet	Loudest
Thermal events	None	None	2 slowdowns	None

All four machines run DGXOS (Ubuntu-based with pre-installed AI tools), have ConnectX-7 200GbE SmartNICs, and support linking two units for up to 256GB combined memory.

The GB10 Chip — What You’re Actually Getting

Every GB10 box contains the same SoC, co-designed by NVIDIA and MediaTek on TSMC’s 3nm process:

Spec	Value
CPU	20 ARM v9.2 cores (10 performance + 10 efficiency)
GPU	Blackwell, 48 SMs, 6,144 CUDA cores
Tensor Cores	5th-gen, 192 total
Memory	128GB LPDDR5X unified (shared CPU+GPU)
Memory bandwidth	273 GB/s actual
Compute	1 PFLOP sparse FP4, ~100 TFLOPS FP16
TDP	140W rated, ~100W observed under load
Networking	ConnectX-7 (200GbE), Wi-Fi

The 273 GB/s memory bandwidth is the number that matters most for LLM inference. Token generation is memory-bandwidth-bound — the GPU needs to read every model weight from memory for each token. At 273 GB/s, you’re limited to roughly 2-3 tok/s on a 70B model at FP8 and about 5 tok/s at FP4 with TensorRT-LLM optimization.

For comparison: an RTX 5090 has 1,792 GB/s bandwidth (6.5x more), and a Mac Studio M3 Ultra has 819 GB/s (3x more). Both generate tokens significantly faster on models that fit in their memory. The GB10’s advantage is pure capacity — 128GB lets you load models that 32GB or even 48GB GPUs can’t touch.

Performance — They’re All The Same

This is the headline: when running the same model on all four machines, token generation is identical. The chip is the chip.

Qwen 34B (Quantized)

Machine	PP 4096 (tok/s)	TG 8192 (tok/s)
DGX Spark	~1,976	61
Dell Pro Max	~1,976	61
ASUS GX10	~1,976	61
MSI EdgeXpert	~1,976	61

Prompt processing and token generation are functionally identical across all four machines. This ran via llama.cpp at ~96% GPU utilization for 45 minutes.

Neotron 30B (Unquantized)

Machine	PP (tok/s)	Avg GPU Power (W)
DGX Spark	1,070	66.0
Dell Pro Max	1,068	62.7
ASUS GX10	1,068	60.0
MSI EdgeXpert	1,068	60.0

Same story. The Spark draws slightly more GPU power (~66W average vs ~60W on the others), but performance is indistinguishable. Clock speeds stayed consistent across all machines with no throttling during normal inference workloads.

What About Bigger Models?

The GB10’s marketing claim is “up to 200B parameters.” Here’s the reality:

Model	Format	Fits in 128GB?	Decode Speed
7B	FP16	Yes (14GB)	Fast
34B	FP16	Yes (68GB)	~61 tok/s
70B	FP16	No (needs 140GB)	N/A
70B	FP8	Yes (70GB)	~2.7 tok/s
70B	FP4 (TRT-LLM)	Yes (35GB)	~5.2 tok/s
120B	FP4	Yes (~60GB)	Slow
200B	FP4	Yes (~100GB)	Very slow

A 70B model does not fit unquantized at FP16 — it needs ~140GB. It fits at FP8, but generates tokens at 2.7 tok/s. That’s usable for batch processing but painful for interactive chat. Time-to-first-token on a 90B model is around 2 minutes.

Thermals and Power — Where the Chassis Matters

All four machines were heat-soaked for 45+ minutes running continuous inference. Under normal LLM workloads, they all behave similarly:

GPU temperature: ~80°C after heat soak
Surface temperature: ~50°C
Total system power: 140-160W
GPU power alone: ~60-66W

The interesting differences appear under stress testing (GPU burn, maximizing compute beyond typical inference):

The Software Power Cap

Every GB10 box hits a software-imposed power cap at ~100W GPU draw. This is what John Carmack flagged when he noted the DGX Spark “appears to be maxing out at only 100 watts power draw, less than half of the rated 240 watts.”

Carmack was right about the 100W cap. But calling it “thermal throttling” (as many headlines did) is wrong. The CPU clock speeds stay consistent across all machines when the power cap engages. It’s a software limit, not a thermal event. NVIDIA could raise this cap via a firmware update — and they’ve already shipped performance-improving software updates at CES 2026 that delivered up to 2.6x improvements for some workloads.

The 240W number is the power adapter rating, not the chip’s expected draw. The GB10 chip is rated at 140W TDP.

The ASUS Exception

Under sustained stress testing, the ASUS GX10 is the only machine that triggers actual thermal slowdown events. Two instances were recorded where the GPU power dropped from 96W to 76W and the OS flagged a software thermal slowdown signal. This happened at GPU temperatures around 95°C — while the Dell reached 99°C without triggering the same event.

The practical impact? Negligible. Looking at the performance charts before and after the thermal events, the output is visually identical. But it’s worth knowing: if you’re running sustained heavy workloads 24/7 (training, continuous agent orchestration), the ASUS is the one machine that shows signs of thermal limits.

The ASUS weighs 1,474g — 220g more than the others. Extra weight usually means extra cooling hardware, but in this case it doesn’t translate to better thermals.

The Acer Note

Storage Review tested a fifth GB10 machine — the Acer Veriton GN100 ($3,999) — and found it ran coolest of all, peaking at just 76°C during demanding prefill workloads where other systems climbed into the mid-to-upper 80s. We haven’t tested the Acer ourselves, but if thermals are your top priority, it’s worth investigating.

Storage — The One Real Difference

This is where the DGX Spark actually justifies its premium. It’s the only machine with a Gen 5 NVMe drive.

Machine	NVMe Gen	Sequential Read	Neotron 30B Cold Load
DGX Spark	Gen 5	~13,000 MB/s	8.49s
Dell Pro Max	Gen 4	~7,000 MB/s	~11.5s
ASUS GX10	Gen 4	~7,000 MB/s	~11.5s
MSI EdgeXpert	Gen 4	~7,000 MB/s	~11.5s

The Spark loads models 25% faster from cold start. For a 30B model, that’s a 3-second difference. For larger models, it scales up — a 70B FP8 model (~70GB) would save 7-8 seconds on the Spark.

This matters if you’re swapping models frequently — running agents that pull different models for different tasks, or serving multiple users who need different models. If you load one model and run it all day, you’ll never notice.

All four machines use 2242 M.2 SSDs, and all are upgradeable. You can buy the cheapest ASUS GX10 (1TB, $3,099) and drop in a 4TB Gen 4 drive yourself to save $500-1,000 versus buying a 4TB configuration. Getting into the ASUS is slightly harder (requires more disassembly) compared to the Spark and Dell, which use magnetic back panels.

Physical Design

Build Quality

The Spark and Dell are nearly twins — both are metal chassis, both use magnetic back panels for SSD access, and they weigh within 1 gram of each other (1,255g vs 1,256g). Dell kept NVIDIA’s reference design mostly intact and added a cleaner front grille.

The ASUS is all metal except for a plastic top panel. It’s the heaviest at 1,474g and has a front power button — the only one to offer this. If you’re rack-mounting these, reaching around the back to hit a power button gets old fast.

The MSI feels like plastic all around. It’s the lightest build quality of the four but has the best port labeling — every port is marked with its speed and function. The Spark has no labels at all.

Rack Mounting

None of these fit in a single rack unit. They’re all too tall by a few millimeters. If you’re planning a rack deployment, you’ll need custom shelving or 2U spacing.

Noise

All four are dramatically quieter than any desktop GPU. The Spark is the quietest, the MSI is the loudest, but the difference is marginal — none of them are louder than a laptop under moderate load.

The Budget Reality Check

Here’s the honest take from a budget-hardware perspective.

GB10 vs Used RTX 3090

	Used RTX 3090	GB10 (any)
Price	~$900	$2,999-$3,999
VRAM/Memory	24GB GDDR6X	128GB LPDDR5X
Memory bandwidth	936 GB/s	273 GB/s
Llama 70B Q4	~18 tok/s	~5 tok/s (FP4)
Llama 7B	~80+ tok/s	~61 tok/s
Power draw	~350W	~100W
Needs host PC	Yes	No (standalone)

The RTX 3090 is faster on every model that fits in 24GB, including quantized 70B. It has 3.4x more memory bandwidth. It costs a quarter of the price.

The GB10’s only advantage: it loads models that 24GB can’t hold. Unquantized 70B at FP8. Unquantized 34B at FP16. Models in the 100-200B range at FP4. If you need that, nothing in this price range competes. But if you’re running 7B-34B quantized models — which is what most local AI hobbyists do — the 3090 wins on speed and cost.

GB10 vs Mac Studio

	Mac Studio M3 Ultra (128GB)	GB10 (any)
Price	~$5,000-6,000	$2,999-$3,999
Memory	128GB unified	128GB unified
Memory bandwidth	819 GB/s	273 GB/s
Llama 70B (FP8)	~8 tok/s	~2.7 tok/s
Ecosystem	macOS, MLX, Ollama	DGXOS (Ubuntu), CUDA

The Mac Studio M3 Ultra has 3x the memory bandwidth and delivers roughly 3x faster token generation on the same models. It costs more, but you get a fully functional desktop computer — not just an inference box.

The GB10 wins on CUDA compatibility and price. If your workflow depends on CUDA-specific tools (TensorRT, vLLM, PyTorch CUDA), the GB10 runs them natively. The Mac requires Metal or MLX.

The GB10 Paradox

The GB10 has a fundamental tension: it loads models that nothing else in its price range can fit, but runs them slowly. A 70B model at FP8 generates 2.7 tokens per second. That’s technically functional but hardly interactive. Time-to-first-token on a 90B+ model can hit 2 minutes.

The machines that run models fast (RTX 5090 at 1,792 GB/s, RTX 3090 at 936 GB/s) can’t load the models the GB10 can. The machines that match the GB10’s capacity (Mac Studio M3 Ultra) run them 3x faster.

The GB10’s sweet spot is narrow: researchers and developers who need unquantized 70B-200B models in a CUDA environment, running batch inference or agent orchestration where 2-5 tok/s is acceptable. If that’s you, any of these four boxes will do the job identically.

Which One to Buy

If you’ve decided a GB10 is right for your workload, here’s how to choose between the four:

Your Priority	Best Choice
Fastest model loading	DGX Spark ($3,999) — Gen 5 NVMe
Cheapest entry	MSI EdgeXpert ($2,999, 1TB) or ASUS GX10 ($3,099, 1TB)
Best build quality	DGX Spark or Dell Pro Max — metal, magnetic panels
Best thermals	DGX Spark or Dell Pro Max — no thermal events
Front power button	ASUS GX10 — only one that has it
Best port labeling	MSI EdgeXpert — every port labeled
Upgrade the SSD yourself	DGX Spark or Dell Pro Max — magnetic back, easy access
Coolest running	Acer Veriton GN100 ($3,999) — peaked at 76°C in third-party testing

If you want the reference design with the fastest storage, get the Spark. If you want to save $900-1,000 and performance is identical anyway, get the MSI or ASUS 1TB and upgrade the SSD later. If build quality matters and you want something between the Spark’s price and the budget options, the Dell at $3,699 for 2TB is a reasonable middle ground.

Don’t pay a premium for performance differences — there aren’t any. You’re paying for storage speed, chassis materials, and convenience features.