~/gpu/m3-ultra-512
apple manufacturer

M3 Ultra 512 512GB

512GB unified memory in a Mac Studio. Runs DeepSeek-V3 at Q4. Yes, really.

VRAM
512GB
Bandwidth
819GB/s
FP16 compute
80TFLOPS
Budget @ ctx 8K
413GB

Tuned to this card.

$ ./vrambudget --gpu m3-ultra-512
$ vrambudget --gpu m3-ultra-512 --ctx 8192 --conc 1 --safety 15%↗ tweetlive
apple
M2 Max 64
64GB
apple
M2 Ultra 192
192GB
apple
M3 Max 64
64GB
apple
M3 Max 96
96GB
apple
M4 Pro 64
64GB
apple
M4 Max 128
128GB
apple · monster
M3 Ultra 512
512GB
apple · neural
M5 Pro 64
64GB
apple · flagship
M5 Max 128
128GB
512GB
64GB
8Ktok
512GB
device capacity
0.05GB
0.0% of total
2.5GB
0.5% of total
433GB
85% of total
$ budget allocation435 / 512 GB used
weightskv cacheoverheadsafety
↳ sorted by best fit
fitscomfortably runs on this budget30 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
430 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
282 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
234 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
208 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
145 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
141 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
93 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
70 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
68 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
65 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
65 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
61 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
54 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
52 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
48 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
42 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
30 GB
fits
Phi-414.7B
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
29 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.0 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
6.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
2.5 GB
fits

Models that fit on a M3 Ultra 512.

$ grep "fits" models.json | head -12
ModelParamsBest quantWeights / 413 GB budgetFit
DeepSeek V3671BQ4_K_M
377
fits
▸ show the math
// weights Q4_K_M for DeepSeek V3 (671B params)
weights = params × bits ÷ 8
        = 671 × 4.5 ÷ 8
        = 377.44 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
377.44 ≤ 432.65  → FITS
headroom  = 55.21 GB of weights budget left
DeepSeek R1671BQ4_K_M
377
fits
▸ show the math
// weights Q4_K_M for DeepSeek R1 (671B params)
weights = params × bits ÷ 8
        = 671 × 4.5 ÷ 8
        = 377.44 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
377.44 ≤ 432.65  → FITS
headroom  = 55.21 GB of weights budget left
Llama 3.1 405B405BFP8/INT8
405
fits
▸ show the math
// weights FP8/INT8 for Llama 3.1 405B (405B params)
weights = params × bits ÷ 8
        = 405 × 8 ÷ 8
        = 405.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
405.00 ≤ 432.65  → FITS
headroom  = 27.65 GB of weights budget left
Mixtral 8x22B141BFP16/BF16
282
fits
▸ show the math
// weights FP16/BF16 for Mixtral 8x22B (141B params)
weights = params × bits ÷ 8
        = 141 × 16 ÷ 8
        = 282.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
282.00 ≤ 432.65  → FITS
headroom  = 150.65 GB of weights budget left
gpt-oss 120B117BFP16/BF16
234
fits
▸ show the math
// weights FP16/BF16 for gpt-oss 120B (117B params)
weights = params × bits ÷ 8
        = 117 × 16 ÷ 8
        = 234.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
234.00 ≤ 432.65  → FITS
headroom  = 198.65 GB of weights budget left
Command R+104BFP16/BF16
208
fits
▸ show the math
// weights FP16/BF16 for Command R+ (104B params)
weights = params × bits ÷ 8
        = 104 × 16 ÷ 8
        = 208.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
208.00 ≤ 432.65  → FITS
headroom  = 224.65 GB of weights budget left
Qwen 2.5 72B72.7BFP16/BF16
145
fits
▸ show the math
// weights FP16/BF16 for Qwen 2.5 72B (72.7B params)
weights = params × bits ÷ 8
        = 72.7 × 16 ÷ 8
        = 145.40 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
145.40 ≤ 432.65  → FITS
headroom  = 287.25 GB of weights budget left
Llama 3.3 70B70.6BFP16/BF16
141
fits
▸ show the math
// weights FP16/BF16 for Llama 3.3 70B (70.6B params)
weights = params × bits ÷ 8
        = 70.6 × 16 ÷ 8
        = 141.20 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
141.20 ≤ 432.65  → FITS
headroom  = 291.45 GB of weights budget left
Mixtral 8x7B46.7BFP16/BF16
93
fits
▸ show the math
// weights FP16/BF16 for Mixtral 8x7B (46.7B params)
weights = params × bits ÷ 8
        = 46.7 × 16 ÷ 8
        = 93.40 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
93.40 ≤ 432.65  → FITS
headroom  = 339.25 GB of weights budget left
Qwen 3.6 35B A3B35BFP16/BF16
70
fits
▸ show the math
// weights FP16/BF16 for Qwen 3.6 35B A3B (35B params)
weights = params × bits ÷ 8
        = 35 × 16 ÷ 8
        = 70.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
70.00 ≤ 432.65  → FITS
headroom  = 362.65 GB of weights budget left
Yi 34B34BFP16/BF16
68
fits
▸ show the math
// weights FP16/BF16 for Yi 34B (34B params)
weights = params × bits ÷ 8
        = 34 × 16 ÷ 8
        = 68.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
68.00 ≤ 432.65  → FITS
headroom  = 364.65 GB of weights budget left
Qwen 2.5 32B32.5BFP16/BF16
65
fits
▸ show the math
// weights FP16/BF16 for Qwen 2.5 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 16 ÷ 8
        = 65.00 GB

// budget on M3 Ultra 512 (512GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 76.80 GB    (15% of 512GB)
budget    = vram − safety − kv − overhead
          = 512 − 76.80 − 0.05 − 2.50
          = 432.65 GB

// fit decision
65.00 ≤ 432.65  → FITS
headroom  = 367.65 GB of weights budget left

Compare to…

$ ./vrambudget --compare

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.