~/gpu/m4-pro-64
apple manufacturer

M4 Pro 64 64GB

64GB on the Pro tier. Same model fits as M3 Max 64, narrower bandwidth.

VRAM
64GB
Bandwidth
273GB/s
FP16 compute
32TFLOPS
Budget @ ctx 8K
50GB

Tuned to this card.

$ ./vrambudget --gpu m4-pro-64
$ vrambudget --gpu m4-pro-64 --ctx 8192 --conc 1 --safety 15%↗ tweetlive
apple
M2 Max 64
64GB
apple
M2 Ultra 192
192GB
apple
M3 Max 64
64GB
apple
M3 Max 96
96GB
apple
M4 Pro 64
64GB
apple
M4 Max 128
128GB
apple · monster
M3 Ultra 512
512GB
apple · neural
M5 Pro 64
64GB
apple · flagship
M5 Max 128
128GB
64GB
64GB
8Ktok
64GB
device capacity
0.05GB
0.1% of total
2.5GB
3.9% of total
52GB
81% of total
$ budget allocation54 / 64 GB used
weightskv cacheoverheadsafety
↳ sorted by best fit
fitscomfortably runs on this budget26 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
50 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
45 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
50 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
49 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
50 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
37 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
36 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
35 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
35 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
32 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
29 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
28 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
48 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
42 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
30 GB
fits
Phi-414.7B
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
29 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.0 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
6.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
2.5 GB
fits
overneeds a bigger card, more aggressive quant, or model split4 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
79 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
228 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over

Models that fit on a M4 Pro 64.

$ grep "fits" models.json | head -12
ModelParamsBest quantWeights / 50 GB budgetFit
Command R+104BQ3_K_M
45
fits
▸ show the math
// weights Q3_K_M for Command R+ (104B params)
weights = params × bits ÷ 8
        = 104 × 3.44 ÷ 8
        = 44.72 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
44.72 ≤ 51.85  → FITS
headroom  = 7.13 GB of weights budget left
Qwen 2.5 72B72.7BQ5_K_M
50
fits
▸ show the math
// weights Q5_K_M for Qwen 2.5 72B (72.7B params)
weights = params × bits ÷ 8
        = 72.7 × 5.5 ÷ 8
        = 49.98 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
49.98 ≤ 51.85  → FITS
headroom  = 1.87 GB of weights budget left
Llama 3.3 70B70.6BQ5_K_M
49
fits
▸ show the math
// weights Q5_K_M for Llama 3.3 70B (70.6B params)
weights = params × bits ÷ 8
        = 70.6 × 5.5 ÷ 8
        = 48.54 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
48.54 ≤ 51.85  → FITS
headroom  = 3.31 GB of weights budget left
Mixtral 8x7B46.7BQ8_0
50
fits
▸ show the math
// weights Q8_0 for Mixtral 8x7B (46.7B params)
weights = params × bits ÷ 8
        = 46.7 × 8.5 ÷ 8
        = 49.62 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
49.62 ≤ 51.85  → FITS
headroom  = 2.23 GB of weights budget left
Qwen 3.6 35B A3B35BQ8_0
37
fits
▸ show the math
// weights Q8_0 for Qwen 3.6 35B A3B (35B params)
weights = params × bits ÷ 8
        = 35 × 8.5 ÷ 8
        = 37.19 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
37.19 ≤ 51.85  → FITS
headroom  = 14.66 GB of weights budget left
Yi 34B34BQ8_0
36
fits
▸ show the math
// weights Q8_0 for Yi 34B (34B params)
weights = params × bits ÷ 8
        = 34 × 8.5 ÷ 8
        = 36.13 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
36.13 ≤ 51.85  → FITS
headroom  = 15.72 GB of weights budget left
Qwen 2.5 32B32.5BQ8_0
35
fits
▸ show the math
// weights Q8_0 for Qwen 2.5 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 8.5 ÷ 8
        = 34.53 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
34.53 ≤ 51.85  → FITS
headroom  = 17.32 GB of weights budget left
Qwen 2.5 Coder 32B32.5BQ8_0
35
fits
▸ show the math
// weights Q8_0 for Qwen 2.5 Coder 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 8.5 ÷ 8
        = 34.53 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
34.53 ≤ 51.85  → FITS
headroom  = 17.32 GB of weights budget left
Qwen3 30B A3B30.5BQ8_0
32
fits
▸ show the math
// weights Q8_0 for Qwen3 30B A3B (30.5B params)
weights = params × bits ÷ 8
        = 30.5 × 8.5 ÷ 8
        = 32.41 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
32.41 ≤ 51.85  → FITS
headroom  = 19.44 GB of weights budget left
Qwen 3.6 27B27BQ8_0
29
fits
▸ show the math
// weights Q8_0 for Qwen 3.6 27B (27B params)
weights = params × bits ÷ 8
        = 27 × 8.5 ÷ 8
        = 28.69 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
28.69 ≤ 51.85  → FITS
headroom  = 23.16 GB of weights budget left
Gemma 4 26B A4B26BQ8_0
28
fits
▸ show the math
// weights Q8_0 for Gemma 4 26B A4B (26B params)
weights = params × bits ÷ 8
        = 26 × 8.5 ÷ 8
        = 27.63 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
27.63 ≤ 51.85  → FITS
headroom  = 24.22 GB of weights budget left
Mistral Small 324BFP16/BF16
48
fits
▸ show the math
// weights FP16/BF16 for Mistral Small 3 (24B params)
weights = params × bits ÷ 8
        = 24 × 16 ÷ 8
        = 48.00 GB

// budget on M4 Pro 64 (64GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.50 GB    (runtime, cuda, allocator)
safety    = 9.60 GB    (15% of 64GB)
budget    = vram − safety − kv − overhead
          = 64 − 9.60 − 0.05 − 2.50
          = 51.85 GB

// fit decision
48.00 ≤ 51.85  → FITS
headroom  = 3.85 GB of weights budget left

Compare to…

$ ./vrambudget --compare

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.