~/gpu/m3-ultra-512 vs m5-max-128

M3 Ultra 512vsM5 Max 128

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m3-ultra-512 m5-max-128

Stat

m3-ultra-512

m5-max-128

VRAM

512 GB

128 GB

-75%

Memory bandwidth

819 GB/s

614 GB/s

-25%

FP16 compute

80 TFLOPS

55 TFLOPS

-31%

Weights budget at 8K ctx

413 GB

103 GB

-75%

Model fit difference.

$ models that change with the card

Fits on both

27of 30

Only on m3-ultra-512

Only on m5-max-128

// showing 12 of 30 models; differing fits first

Model

m3-ultra-512

m5-max-128

Llama 3.1 405B405B

fitsFP8/INT8

overQ4_K_M

DeepSeek V3671B

fitsQ4_K_M

overQ4_K_M

DeepSeek R1671B

fitsQ4_K_M

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsQ8_0

Qwen 2.5 7B7B

fitsFP16/BF16

Qwen 2.5 32B32.5B

fitsFP16/BF16

Qwen 2.5 Coder 32B32.5B

fitsFP16/BF16

Qwen 2.5 72B72.7B

fitsFP16/BF16

fitsQ8_0

Qwen3 30B A3B30.5B

fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

M3 Ultra 512 has 384 GB more.

Faster decode (bandwidth)

M3 Ultra 512 by +33%.

Faster prefill (compute)

M3 Ultra 512 by +45% TFLOPS.

Catalog models that fit

M3 Ultra 512: 30 fit · M5 Max 128: 27.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

M3 Ultra 512 manufacturerM3 Ultra 512vsM5 Max 128 manufacturerM5 Max 128

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

M3 Ultra 512vsM5 Max 128