~/gpu/m5-max-128 vs rtx-5090

M5 Max 128 manufacturerM5 Max 128vsRTX 5090 manufacturerRTX 5090

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m5-max-128 rtx-5090
Stat
m5-max-128
rtx-5090
Δ
VRAM
128 GB
32 GB
-75%
Memory bandwidth
614 GB/s
1,792 GB/s
+192%
FP16 compute
55 TFLOPS
838 TFLOPS
+1424%
Weights budget at 8K ctx
103 GB
25 GB
-76%

Model fit difference.

$ models that change with the card
Fits on both
22of 30
Only on m5-max-128
5
Only on rtx-5090
0

// showing 12 of 30 models; differing fits first

Model
m5-max-128
rtx-5090
fitsQ8_0
overQ4_K_M
fitsQ8_0
overQ4_K_M
fitsQ6_K
overQ4_K_M
fitsQ5_K_M
overQ4_K_M
fitsQ6_K
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
overQ4_K_M
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsQ5_K_M
fitsFP16/BF16
fitsQ5_K_M

Which one wins for…

$ ./recommend --by-workload
More VRAM headroom

M5 Max 128 has 96 GB more.

Faster decode (bandwidth)

RTX 5090 by +192%.

Faster prefill (compute)

RTX 5090 by +1424% TFLOPS.

Catalog models that fit

M5 Max 128: 27 fit · RTX 5090: 22.

Drill into either card.

$ ./vrambudget --gpu

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.