~/gpu/m3-max-96 vs rtx-4090

M3 Max 96 manufacturerM3 Max 96vsRTX 4090 manufacturerRTX 4090

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m3-max-96 rtx-4090
Stat
m3-max-96
rtx-4090
Δ
VRAM
96 GB
24 GB
-75%
Memory bandwidth
400 GB/s
1,008 GB/s
+152%
FP16 compute
35 TFLOPS
330 TFLOPS
+843%
Weights budget at 8K ctx
76 GB
18 GB
-76%

Model fit difference.

$ models that change with the card
Fits on both
21of 30
Only on m3-max-96
6
Only on rtx-4090
0

// showing 12 of 30 models; differing fits first

Model
m3-max-96
rtx-4090
fitsQ8_0
overQ4_K_M
fitsFP8/INT8
overQ4_K_M
fitsQ4_K_M
overQ4_K_M
fitsQ8_0
overQ4_K_M
fitsAWQ 4-BIT
overQ4_K_M
fitsQ5_K_M
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
overQ4_K_M
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsAWQ 4-BIT

Which one wins for…

$ ./recommend --by-workload
More VRAM headroom

M3 Max 96 has 72 GB more.

Faster decode (bandwidth)

RTX 4090 by +152%.

Faster prefill (compute)

RTX 4090 by +843% TFLOPS.

Catalog models that fit

M3 Max 96: 27 fit · RTX 4090: 21.

Drill into either card.

$ ./vrambudget --gpu

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.