~/gpu/m4-max-128 vs m5-max-128

M4 Max 128vsM5 Max 128

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m4-max-128 m5-max-128

Stat

m4-max-128

m5-max-128

VRAM

128 GB

Memory bandwidth

546 GB/s

614 GB/s

+12%

FP16 compute

42 TFLOPS

55 TFLOPS

+31%

Weights budget at 8K ctx

102 GB

103 GB

+1%

Model fit difference.

$ models that change with the card

Fits on both

27of 30

Only on m4-max-128

Only on m5-max-128

// showing 12 of 30 models; differing fits first

Model

m4-max-128

m5-max-128

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsQ8_0

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

Qwen 2.5 Coder 32B32.5B

fitsFP16/BF16

fitsQ8_0

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

Tied at 128 GB. Choose on bandwidth or price.

Faster decode (bandwidth)

M5 Max 128 by +12%.

Faster prefill (compute)

M5 Max 128 by +31% TFLOPS.

Catalog models that fit

Tied: 27 of 30 fit on each.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

M4 Max 128 manufacturerM4 Max 128vsM5 Max 128 manufacturerM5 Max 128

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

M4 Max 128vsM5 Max 128