~/gpu/rtx-6000-ada
nvidia manufacturer

RTX 6000 Ada 48GB

Same 48GB, modern silicon. Faster than A6000 across the board.

VRAM
48GB
Bandwidth
960GB/s
FP16 compute
364TFLOPS
Budget @ ctx 8K
37GB

Tuned to this card.

$ ./vrambudget --gpu rtx-6000-ada
$ vrambudget --gpu rtx-6000-ada --ctx 8192 --conc 1 --safety 15%↗ tweetlive
ampere · 48GB
RTX A6000
48GB
ada · 48GB
RTX 6000 Ada
48GB
blackwell · 96GB
RTX Pro 6000
96GB
ada · datacenter
L40S
48GB
48GB
64GB
8Ktok
48GB
device capacity
0.05GB
0.1% of total
2.2GB
4.6% of total
39GB
80% of total
$ budget allocation41 / 48 GB used
weightskv cacheoverheadsafety
↳ sorted by best fit
fitscomfortably runs on this budget24 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
31 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
38 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
38 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
37 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
36 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
35 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
35 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
32 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
29 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
28 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
26 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
22 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
30 GB
fits
Phi-414.7B
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
29 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.0 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
6.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
2.5 GB
fits
overneeds a bigger card, more aggressive quant, or model split6 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
59 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
66 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
79 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
228 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over

Models that fit on a RTX 6000 Ada.

$ grep "fits" models.json | head -12
ModelParamsBest quantWeights / 37 GB budgetFit
Qwen 2.5 72B72.7BQ3_K_M
31
fits
▸ show the math
// weights Q3_K_M for Qwen 2.5 72B (72.7B params)
weights = params × bits ÷ 8
        = 72.7 × 3.44 ÷ 8
        = 31.26 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
31.26 ≤ 38.55  → FITS
headroom  = 7.29 GB of weights budget left
Llama 3.3 70B70.6BQ3_K_M
30
fits
▸ show the math
// weights Q3_K_M for Llama 3.3 70B (70.6B params)
weights = params × bits ÷ 8
        = 70.6 × 3.44 ÷ 8
        = 30.36 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
30.36 ≤ 38.55  → FITS
headroom  = 8.19 GB of weights budget left
Mixtral 8x7B46.7BQ5_K_M
32
fits
▸ show the math
// weights Q5_K_M for Mixtral 8x7B (46.7B params)
weights = params × bits ÷ 8
        = 46.7 × 5.5 ÷ 8
        = 32.11 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
32.11 ≤ 38.55  → FITS
headroom  = 6.44 GB of weights budget left
Qwen 3.6 35B A3B35BFP8/INT8
35
fits
▸ show the math
// weights FP8/INT8 for Qwen 3.6 35B A3B (35B params)
weights = params × bits ÷ 8
        = 35 × 8 ÷ 8
        = 35.00 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
35.00 ≤ 38.55  → FITS
headroom  = 3.55 GB of weights budget left
Yi 34B34BQ8_0
36
fits
▸ show the math
// weights Q8_0 for Yi 34B (34B params)
weights = params × bits ÷ 8
        = 34 × 8.5 ÷ 8
        = 36.13 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
36.13 ≤ 38.55  → FITS
headroom  = 2.42 GB of weights budget left
Qwen 2.5 32B32.5BQ8_0
35
fits
▸ show the math
// weights Q8_0 for Qwen 2.5 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 8.5 ÷ 8
        = 34.53 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
34.53 ≤ 38.55  → FITS
headroom  = 4.02 GB of weights budget left
Qwen 2.5 Coder 32B32.5BQ8_0
35
fits
▸ show the math
// weights Q8_0 for Qwen 2.5 Coder 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 8.5 ÷ 8
        = 34.53 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
34.53 ≤ 38.55  → FITS
headroom  = 4.02 GB of weights budget left
Qwen3 30B A3B30.5BQ8_0
32
fits
▸ show the math
// weights Q8_0 for Qwen3 30B A3B (30.5B params)
weights = params × bits ÷ 8
        = 30.5 × 8.5 ÷ 8
        = 32.41 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
32.41 ≤ 38.55  → FITS
headroom  = 6.14 GB of weights budget left
Qwen 3.6 27B27BQ8_0
29
fits
▸ show the math
// weights Q8_0 for Qwen 3.6 27B (27B params)
weights = params × bits ÷ 8
        = 27 × 8.5 ÷ 8
        = 28.69 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
28.69 ≤ 38.55  → FITS
headroom  = 9.86 GB of weights budget left
Gemma 4 26B A4B26BQ8_0
28
fits
▸ show the math
// weights Q8_0 for Gemma 4 26B A4B (26B params)
weights = params × bits ÷ 8
        = 26 × 8.5 ÷ 8
        = 27.63 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
27.63 ≤ 38.55  → FITS
headroom  = 10.92 GB of weights budget left
Mistral Small 324BQ8_0
26
fits
▸ show the math
// weights Q8_0 for Mistral Small 3 (24B params)
weights = params × bits ÷ 8
        = 24 × 8.5 ÷ 8
        = 25.50 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
25.50 ≤ 38.55  → FITS
headroom  = 13.05 GB of weights budget left
gpt-oss 20B20.9BQ8_0
22
fits
▸ show the math
// weights Q8_0 for gpt-oss 20B (20.9B params)
weights = params × bits ÷ 8
        = 20.9 × 8.5 ÷ 8
        = 22.21 GB

// budget on RTX 6000 Ada (48GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 2.20 GB    (runtime, cuda, allocator)
safety    = 7.20 GB    (15% of 48GB)
budget    = vram − safety − kv − overhead
          = 48 − 7.20 − 0.05 − 2.20
          = 38.55 GB

// fit decision
22.21 ≤ 38.55  → FITS
headroom  = 16.34 GB of weights budget left

Compare to…

$ ./vrambudget --compare

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.