~/gpu/m3-max-96 vs m4-max-128

M3 Max 96vsM4 Max 128

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m3-max-96 m4-max-128

Stat

m3-max-96

m4-max-128

VRAM

96 GB

128 GB

+33%

Memory bandwidth

400 GB/s

546 GB/s

+37%

FP16 compute

35 TFLOPS

42 TFLOPS

+20%

Weights budget at 8K ctx

76 GB

102 GB

+34%

Model fit difference.

$ models that change with the card

Fits on both

27of 30

Only on m3-max-96

Only on m4-max-128

// showing 12 of 30 models; differing fits first

Model

m3-max-96

m4-max-128

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsQ8_0

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

Qwen 2.5 Coder 32B32.5B

fitsFP16/BF16

Qwen 2.5 72B72.7B

fitsFP8/INT8

fitsQ8_0

Qwen3 30B A3B30.5B

fitsFP16/BF16

Qwen 3.5 9B9B

fitsFP16/BF16

Qwen 3.6 27B27B

fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

M4 Max 128 has 32 GB more.

Faster decode (bandwidth)

M4 Max 128 by +37%.

Faster prefill (compute)

M4 Max 128 by +20% TFLOPS.

Catalog models that fit

Tied: 27 of 30 fit on each.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

M3 Max 96 manufacturerM3 Max 96vsM4 Max 128 manufacturerM4 Max 128

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

M3 Max 96vsM4 Max 128