📚 More on this topic: GPU Buying Guide · VRAM Requirements · Multi-GPU Local AI · What Can You Run on 24GB VRAM

Four companies are selling boxes with the same chip inside. NVIDIA’s DGX Spark, Dell’s Pro Max GB10, ASUS’s Ascent GX10, and MSI’s EdgeXpert all use the NVIDIA Grace Blackwell GB10 superchip — 20 ARM cores, a Blackwell GPU with 6,144 CUDA cores, and 128GB of unified LPDDR5X memory. Same silicon, same 1 PFLOP sparse FP4 compute, same DGXOS.

The pitch is compelling: a petaflop AI computer on your desk for $3,000-4,000. Load 70B models unquantized. Run 200B models in FP4. No multi-GPU complexity, no 850W power supply, no rack mount.

But “same chip” doesn’t mean same machine. The chassis matters for thermals. The NVMe generation matters for model loading. The build quality matters if you’re spending $4,000. And most importantly — should you spend $3,000-4,000 on any of these when a used RTX 3090 costs $900?

Here’s what 45-minute heat soak tests, real inference benchmarks, and side-by-side comparisons actually show.


The Comparison Table

DGX SparkDell Pro MaxASUS GX10MSI EdgeXpert
Price (1TB)N/AN/A$3,099$2,999
Price (2TB)N/A$3,699~$3,200N/A
Price (4TB)$3,999$3,999$4,150$3,999
NVMe genGen 5Gen 4Gen 4Gen 4
SSD size2242224222422242
SSD upgradeableYesYesYes (harder)Yes
Weight1,255g1,256g1,474g1,257g
Back panelMagneticMagnetic (6 magnets)ScrewsScrews
Port labelingNoneMinimalGoodBest
Front power buttonNoNoYesNo
ChassisMetalMetalMetal + plastic topMostly plastic
NoiseQuietestVery quietQuietLoudest
Thermal eventsNoneNone2 slowdownsNone

All four machines run DGXOS (Ubuntu-based with pre-installed AI tools), have ConnectX-7 200GbE SmartNICs, and support linking two units for up to 256GB combined memory.


The GB10 Chip — What You’re Actually Getting

Every GB10 box contains the same SoC, co-designed by NVIDIA and MediaTek on TSMC’s 3nm process:

SpecValue
CPU20 ARM v9.2 cores (10 performance + 10 efficiency)
GPUBlackwell, 48 SMs, 6,144 CUDA cores
Tensor Cores5th-gen, 192 total
Memory128GB LPDDR5X unified (shared CPU+GPU)
Memory bandwidth273 GB/s actual
Compute1 PFLOP sparse FP4, ~100 TFLOPS FP16
TDP140W rated, ~100W observed under load
NetworkingConnectX-7 (200GbE), Wi-Fi

The 273 GB/s memory bandwidth is the number that matters most for LLM inference. Token generation is memory-bandwidth-bound — the GPU needs to read every model weight from memory for each token. At 273 GB/s, you’re limited to roughly 2-3 tok/s on a 70B model at FP8 and about 5 tok/s at FP4 with TensorRT-LLM optimization.

For comparison: an RTX 5090 has 1,792 GB/s bandwidth (6.5x more), and a Mac Studio M3 Ultra has 819 GB/s (3x more). Both generate tokens significantly faster on models that fit in their memory. The GB10’s advantage is pure capacity — 128GB lets you load models that 32GB or even 48GB GPUs can’t touch.


Performance — They’re All The Same

This is the headline: when running the same model on all four machines, token generation is identical. The chip is the chip.

Qwen 34B (Quantized)

MachinePP 4096 (tok/s)TG 8192 (tok/s)
DGX Spark~1,97661
Dell Pro Max~1,97661
ASUS GX10~1,97661
MSI EdgeXpert~1,97661

Prompt processing and token generation are functionally identical across all four machines. This ran via llama.cpp at ~96% GPU utilization for 45 minutes.

Neotron 30B (Unquantized)

MachinePP (tok/s)Avg GPU Power (W)
DGX Spark1,07066.0
Dell Pro Max1,06862.7
ASUS GX101,06860.0
MSI EdgeXpert1,06860.0

Same story. The Spark draws slightly more GPU power (~66W average vs ~60W on the others), but performance is indistinguishable. Clock speeds stayed consistent across all machines with no throttling during normal inference workloads.

What About Bigger Models?

The GB10’s marketing claim is “up to 200B parameters.” Here’s the reality:

ModelFormatFits in 128GB?Decode Speed
7BFP16Yes (14GB)Fast
34BFP16Yes (68GB)~61 tok/s
70BFP16No (needs 140GB)N/A
70BFP8Yes (70GB)~2.7 tok/s
70BFP4 (TRT-LLM)Yes (35GB)~5.2 tok/s
120BFP4Yes (~60GB)Slow
200BFP4Yes (~100GB)Very slow

A 70B model does not fit unquantized at FP16 — it needs ~140GB. It fits at FP8, but generates tokens at 2.7 tok/s. That’s usable for batch processing but painful for interactive chat. Time-to-first-token on a 90B model is around 2 minutes.


Thermals and Power — Where the Chassis Matters

All four machines were heat-soaked for 45+ minutes running continuous inference. Under normal LLM workloads, they all behave similarly:

  • GPU temperature: ~80°C after heat soak
  • Surface temperature: ~50°C
  • Total system power: 140-160W
  • GPU power alone: ~60-66W

The interesting differences appear under stress testing (GPU burn, maximizing compute beyond typical inference):

The Software Power Cap

Every GB10 box hits a software-imposed power cap at ~100W GPU draw. This is what John Carmack flagged when he noted the DGX Spark “appears to be maxing out at only 100 watts power draw, less than half of the rated 240 watts.”

Carmack was right about the 100W cap. But calling it “thermal throttling” (as many headlines did) is wrong. The CPU clock speeds stay consistent across all machines when the power cap engages. It’s a software limit, not a thermal event. NVIDIA could raise this cap via a firmware update — and they’ve already shipped performance-improving software updates at CES 2026 that delivered up to 2.6x improvements for some workloads.

The 240W number is the power adapter rating, not the chip’s expected draw. The GB10 chip is rated at 140W TDP.

The ASUS Exception

Under sustained stress testing, the ASUS GX10 is the only machine that triggers actual thermal slowdown events. Two instances were recorded where the GPU power dropped from 96W to 76W and the OS flagged a software thermal slowdown signal. This happened at GPU temperatures around 95°C — while the Dell reached 99°C without triggering the same event.

The practical impact? Negligible. Looking at the performance charts before and after the thermal events, the output is visually identical. But it’s worth knowing: if you’re running sustained heavy workloads 24/7 (training, continuous agent orchestration), the ASUS is the one machine that shows signs of thermal limits.

The ASUS weighs 1,474g — 220g more than the others. Extra weight usually means extra cooling hardware, but in this case it doesn’t translate to better thermals.

The Acer Note

Storage Review tested a fifth GB10 machine — the Acer Veriton GN100 ($3,999) — and found it ran coolest of all, peaking at just 76°C during demanding prefill workloads where other systems climbed into the mid-to-upper 80s. We haven’t tested the Acer ourselves, but if thermals are your top priority, it’s worth investigating.


Storage — The One Real Difference

This is where the DGX Spark actually justifies its premium. It’s the only machine with a Gen 5 NVMe drive.

MachineNVMe GenSequential ReadNeotron 30B Cold Load
DGX SparkGen 5~13,000 MB/s8.49s
Dell Pro MaxGen 4~7,000 MB/s~11.5s
ASUS GX10Gen 4~7,000 MB/s~11.5s
MSI EdgeXpertGen 4~7,000 MB/s~11.5s

The Spark loads models 25% faster from cold start. For a 30B model, that’s a 3-second difference. For larger models, it scales up — a 70B FP8 model (~70GB) would save 7-8 seconds on the Spark.

This matters if you’re swapping models frequently — running agents that pull different models for different tasks, or serving multiple users who need different models. If you load one model and run it all day, you’ll never notice.

All four machines use 2242 M.2 SSDs, and all are upgradeable. You can buy the cheapest ASUS GX10 (1TB, $3,099) and drop in a 4TB Gen 4 drive yourself to save $500-1,000 versus buying a 4TB configuration. Getting into the ASUS is slightly harder (requires more disassembly) compared to the Spark and Dell, which use magnetic back panels.


Physical Design

Build Quality

The Spark and Dell are nearly twins — both are metal chassis, both use magnetic back panels for SSD access, and they weigh within 1 gram of each other (1,255g vs 1,256g). Dell kept NVIDIA’s reference design mostly intact and added a cleaner front grille.

The ASUS is all metal except for a plastic top panel. It’s the heaviest at 1,474g and has a front power button — the only one to offer this. If you’re rack-mounting these, reaching around the back to hit a power button gets old fast.

The MSI feels like plastic all around. It’s the lightest build quality of the four but has the best port labeling — every port is marked with its speed and function. The Spark has no labels at all.

Rack Mounting

None of these fit in a single rack unit. They’re all too tall by a few millimeters. If you’re planning a rack deployment, you’ll need custom shelving or 2U spacing.

Noise

All four are dramatically quieter than any desktop GPU. The Spark is the quietest, the MSI is the loudest, but the difference is marginal — none of them are louder than a laptop under moderate load.


The Budget Reality Check

Here’s the honest take from a budget-hardware perspective.

GB10 vs Used RTX 3090

Used RTX 3090GB10 (any)
Price~$900$2,999-$3,999
VRAM/Memory24GB GDDR6X128GB LPDDR5X
Memory bandwidth936 GB/s273 GB/s
Llama 70B Q4~18 tok/s~5 tok/s (FP4)
Llama 7B~80+ tok/s~61 tok/s
Power draw~350W~100W
Needs host PCYesNo (standalone)

The RTX 3090 is faster on every model that fits in 24GB, including quantized 70B. It has 3.4x more memory bandwidth. It costs a quarter of the price.

The GB10’s only advantage: it loads models that 24GB can’t hold. Unquantized 70B at FP8. Unquantized 34B at FP16. Models in the 100-200B range at FP4. If you need that, nothing in this price range competes. But if you’re running 7B-34B quantized models — which is what most local AI hobbyists do — the 3090 wins on speed and cost.

GB10 vs Mac Studio

Mac Studio M3 Ultra (128GB)GB10 (any)
Price~$5,000-6,000$2,999-$3,999
Memory128GB unified128GB unified
Memory bandwidth819 GB/s273 GB/s
Llama 70B (FP8)~8 tok/s~2.7 tok/s
EcosystemmacOS, MLX, OllamaDGXOS (Ubuntu), CUDA

The Mac Studio M3 Ultra has 3x the memory bandwidth and delivers roughly 3x faster token generation on the same models. It costs more, but you get a fully functional desktop computer — not just an inference box.

The GB10 wins on CUDA compatibility and price. If your workflow depends on CUDA-specific tools (TensorRT, vLLM, PyTorch CUDA), the GB10 runs them natively. The Mac requires Metal or MLX.


The GB10 Paradox

The GB10 has a fundamental tension: it loads models that nothing else in its price range can fit, but runs them slowly. A 70B model at FP8 generates 2.7 tokens per second. That’s technically functional but hardly interactive. Time-to-first-token on a 90B+ model can hit 2 minutes.

The machines that run models fast (RTX 5090 at 1,792 GB/s, RTX 3090 at 936 GB/s) can’t load the models the GB10 can. The machines that match the GB10’s capacity (Mac Studio M3 Ultra) run them 3x faster.

The GB10’s sweet spot is narrow: researchers and developers who need unquantized 70B-200B models in a CUDA environment, running batch inference or agent orchestration where 2-5 tok/s is acceptable. If that’s you, any of these four boxes will do the job identically.


Which One to Buy

If you’ve decided a GB10 is right for your workload, here’s how to choose between the four:

Your PriorityBest Choice
Fastest model loadingDGX Spark ($3,999) — Gen 5 NVMe
Cheapest entryMSI EdgeXpert ($2,999, 1TB) or ASUS GX10 ($3,099, 1TB)
Best build qualityDGX Spark or Dell Pro Max — metal, magnetic panels
Best thermalsDGX Spark or Dell Pro Max — no thermal events
Front power buttonASUS GX10 — only one that has it
Best port labelingMSI EdgeXpert — every port labeled
Upgrade the SSD yourselfDGX Spark or Dell Pro Max — magnetic back, easy access
Coolest runningAcer Veriton GN100 ($3,999) — peaked at 76°C in third-party testing

If you want the reference design with the fastest storage, get the Spark. If you want to save $900-1,000 and performance is identical anyway, get the MSI or ASUS 1TB and upgrade the SSD later. If build quality matters and you want something between the Spark’s price and the budget options, the Dell at $3,699 for 2TB is a reasonable middle ground.

Don’t pay a premium for performance differences — there aren’t any. You’re paying for storage speed, chassis materials, and convenience features.