~/gpu/m3-max-96 vs rtx-4090

M3 Max 96vsRTX 4090

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs m3-max-96 rtx-4090

Stat

m3-max-96

rtx-4090

VRAM

96 GB

24 GB

-75%

Memory bandwidth

400 GB/s

1,008 GB/s

+152%

FP16 compute

35 TFLOPS

330 TFLOPS

+843%

Weights budget at 8K ctx

76 GB

18 GB

-76%

Model fit difference.

$ models that change with the card

Fits on both

21of 30

Only on m3-max-96

Only on rtx-4090

// showing 12 of 30 models; differing fits first

Model

m3-max-96

rtx-4090

Llama 3.3 70B70.6B

fitsQ8_0

overQ4_K_M

Qwen 2.5 72B72.7B

fitsFP8/INT8

overQ4_K_M

gpt-oss 120B117B

fitsQ4_K_M

overQ4_K_M

Mixtral 8x7B46.7B

fitsQ8_0

overQ4_K_M

Mixtral 8x22B141B

fitsAWQ 4-BIT

overQ4_K_M

Command R+104B

fitsQ5_K_M

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

overQ4_K_M

fitsFP16/BF16

fitsFP16/BF16

fitsAWQ 4-BIT

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

M3 Max 96 has 72 GB more.

Faster decode (bandwidth)

RTX 4090 by +152%.

Faster prefill (compute)

RTX 4090 by +843% TFLOPS.

Catalog models that fit

M3 Max 96: 27 fit · RTX 4090: 21.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

M3 Max 96 manufacturerM3 Max 96vsRTX 4090 manufacturerRTX 4090

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

M3 Max 96vsRTX 4090