~/gpu/m2-ultra-192 vs m3-ultra-512

M2 Ultra 192 manufacturerM2 Ultra 192vsM3 Ultra 512 manufacturerM3 Ultra 512

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m2-ultra-192 m3-ultra-512
Stat
m2-ultra-192
m3-ultra-512
Δ
VRAM
192 GB
512 GB
+167%
Memory bandwidth
800 GB/s
819 GB/s
+2%
FP16 compute
54 TFLOPS
80 TFLOPS
+48%
Weights budget at 8K ctx
154 GB
413 GB
+168%

Model fit difference.

$ models that change with the card
Fits on both
27of 30
Only on m2-ultra-192
0
Only on m3-ultra-512
3

// showing 12 of 30 models; differing fits first

Model
m2-ultra-192
m3-ultra-512
overQ4_K_M
fitsFP8/INT8
overQ4_K_M
fitsQ4_K_M
overQ4_K_M
fitsQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload
More VRAM headroom

M3 Ultra 512 has 320 GB more.

Faster decode (bandwidth)

M3 Ultra 512 by +2%.

Faster prefill (compute)

M3 Ultra 512 by +48% TFLOPS.

Catalog models that fit

M3 Ultra 512: 30 fit · M2 Ultra 192: 27.

Drill into either card.

$ ./vrambudget --gpu

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.