~/gpu/m5-max-128 vs rtx-5090

M5 Max 128vsRTX 5090

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m5-max-128 rtx-5090

Stat

m5-max-128

rtx-5090

VRAM

128 GB

32 GB

-75%

Memory bandwidth

614 GB/s

1,792 GB/s

+192%

FP16 compute

55 TFLOPS

838 TFLOPS

+1424%

Weights budget at 8K ctx

103 GB

25 GB

-76%

Model fit difference.

$ models that change with the card

Fits on both

22of 30

Only on m5-max-128

Only on rtx-5090

// showing 12 of 30 models; differing fits first

Model

m5-max-128

rtx-5090

Llama 3.3 70B70.6B

fitsQ8_0

overQ4_K_M

Qwen 2.5 72B72.7B

fitsQ8_0

overQ4_K_M

gpt-oss 120B117B

fitsQ6_K

overQ4_K_M

Mixtral 8x22B141B

fitsQ5_K_M

overQ4_K_M

Command R+104B

fitsQ6_K

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

fitsQ5_K_M

Qwen 2.5 Coder 32B32.5B

fitsFP16/BF16

fitsQ5_K_M

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

M5 Max 128 has 96 GB more.

Faster decode (bandwidth)

RTX 5090 by +192%.

Faster prefill (compute)

RTX 5090 by +1424% TFLOPS.

Catalog models that fit

M5 Max 128: 27 fit · RTX 5090: 22.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

M5 Max 128 manufacturerM5 Max 128vsRTX 5090 manufacturerRTX 5090

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

M5 Max 128vsRTX 5090