~/gpu/m3-max-96 vs m4-max-128

M3 Max 96 manufacturerM3 Max 96vsM4 Max 128 manufacturerM4 Max 128

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m3-max-96 m4-max-128
Stat
m3-max-96
m4-max-128
Δ
VRAM
96 GB
128 GB
+33%
Memory bandwidth
400 GB/s
546 GB/s
+37%
FP16 compute
35 TFLOPS
42 TFLOPS
+20%
Weights budget at 8K ctx
76 GB
102 GB
+34%

Model fit difference.

$ models that change with the card
Fits on both
27of 30
Only on m3-max-96
0
Only on m4-max-128
0

// showing 12 of 30 models; differing fits first

Model
m3-max-96
m4-max-128
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsQ8_0
fitsQ8_0
overQ4_K_M
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP8/INT8
fitsQ8_0
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload
More VRAM headroom

M4 Max 128 has 32 GB more.

Faster decode (bandwidth)

M4 Max 128 by +37%.

Faster prefill (compute)

M4 Max 128 by +20% TFLOPS.

Catalog models that fit

Tied: 27 of 30 fit on each.

Drill into either card.

$ ./vrambudget --gpu

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.