~/gpu/m2-ultra-192 vs m3-ultra-512

M2 Ultra 192vsM3 Ultra 512

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m2-ultra-192 m3-ultra-512

Stat

m2-ultra-192

m3-ultra-512

VRAM

192 GB

512 GB

+167%

Memory bandwidth

800 GB/s

819 GB/s

+2%

FP16 compute

54 TFLOPS

80 TFLOPS

+48%

Weights budget at 8K ctx

154 GB

413 GB

+168%

Model fit difference.

$ models that change with the card

Fits on both

27of 30

Only on m2-ultra-192

Only on m3-ultra-512

// showing 12 of 30 models; differing fits first

Model

m2-ultra-192

m3-ultra-512

Llama 3.1 405B405B

overQ4_K_M

fitsFP8/INT8

DeepSeek V3671B

overQ4_K_M

fitsQ4_K_M

DeepSeek R1671B

overQ4_K_M

fitsQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

Qwen 2.5 Coder 32B32.5B

fitsFP16/BF16

Qwen 2.5 72B72.7B

fitsFP16/BF16

Qwen3 30B A3B30.5B

fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

M3 Ultra 512 has 320 GB more.

Faster decode (bandwidth)

M3 Ultra 512 by +2%.

Faster prefill (compute)

M3 Ultra 512 by +48% TFLOPS.

Catalog models that fit

M3 Ultra 512: 30 fit · M2 Ultra 192: 27.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

M2 Ultra 192 manufacturerM2 Ultra 192vsM3 Ultra 512 manufacturerM3 Ultra 512

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

M2 Ultra 192vsM3 Ultra 512