~/gpu/rtx-5080
nvidia manufacturer

RTX 5080 16GB

16GB Blackwell flagship-1. Solid 30B at Q4 with long context.

VRAM
16GB
Bandwidth
960GB/s
FP16 compute
565TFLOPS
Budget @ ctx 8K
12GB

Tuned to this card.

$ ./vrambudget --gpu rtx-5080
$ vrambudget --gpu rtx-5080 --ctx 8192 --conc 1 --safety 15%↗ tweetlive
blackwell
RTX 5070
12GB
blackwell
RTX 5070 Ti
16GB
blackwell
RTX 5080
16GB
blackwell · flagship
RTX 5090
32GB
16GB
64GB
8Ktok
16GB
device capacity
0.05GB
0.3% of total
1.4GB
8.8% of total
12GB
76% of total
$ budget allocation14 / 16 GB used
weightskv cacheoverheadsafety
↳ sorted by best fit
fitscomfortably runs on this budget16 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
12 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
11 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
10 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
12 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
10 GB
fits
Phi-414.7B
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
12 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
9.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
9.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.5 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.5 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.7 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.0 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
6.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
2.5 GB
fits
overneeds a bigger card, more aggressive quant, or model split14 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
17 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
19 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
20 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
26 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
40 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
41 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
59 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
66 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
79 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
228 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over

Models that fit on a RTX 5080.

$ grep "fits" models.json | head -12
ModelParamsBest quantWeights / 12 GB budgetFit
Qwen 3.6 27B27BQ3_K_M
12
fits
▸ show the math
// weights Q3_K_M for Qwen 3.6 27B (27B params)
weights = params × bits ÷ 8
        = 27 × 3.44 ÷ 8
        = 11.61 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
11.61 ≤ 12.15  → FITS
headroom  = 0.54 GB of weights budget left
Gemma 4 26B A4B26BQ3_K_M
11
fits
▸ show the math
// weights Q3_K_M for Gemma 4 26B A4B (26B params)
weights = params × bits ÷ 8
        = 26 × 3.44 ÷ 8
        = 11.18 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
11.18 ≤ 12.15  → FITS
headroom  = 0.97 GB of weights budget left
Mistral Small 324BQ3_K_M
10
fits
▸ show the math
// weights Q3_K_M for Mistral Small 3 (24B params)
weights = params × bits ÷ 8
        = 24 × 3.44 ÷ 8
        = 10.32 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
10.32 ≤ 12.15  → FITS
headroom  = 1.83 GB of weights budget left
gpt-oss 20B20.9BQ4_K_M
12
fits
▸ show the math
// weights Q4_K_M for gpt-oss 20B (20.9B params)
weights = params × bits ÷ 8
        = 20.9 × 4.5 ÷ 8
        = 11.76 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
11.76 ≤ 12.15  → FITS
headroom  = 0.39 GB of weights budget left
StarCoder2 15B15BQ5_K_M
10
fits
▸ show the math
// weights Q5_K_M for StarCoder2 15B (15B params)
weights = params × bits ÷ 8
        = 15 × 5.5 ÷ 8
        = 10.31 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
10.31 ≤ 12.15  → FITS
headroom  = 1.84 GB of weights budget left
Phi-414.7BQ5_K_M
10
fits
▸ show the math
// weights Q5_K_M for Phi-4 (14.7B params)
weights = params × bits ÷ 8
        = 14.7 × 5.5 ÷ 8
        = 10.11 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
10.11 ≤ 12.15  → FITS
headroom  = 2.04 GB of weights budget left
Qwen 3.5 9B9BQ8_0
9.6
fits
▸ show the math
// weights Q8_0 for Qwen 3.5 9B (9B params)
weights = params × bits ÷ 8
        = 9 × 8.5 ÷ 8
        = 9.56 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
9.56 ≤ 12.15  → FITS
headroom  = 2.59 GB of weights budget left
Gemma 2 9B9BQ8_0
9.6
fits
▸ show the math
// weights Q8_0 for Gemma 2 9B (9B params)
weights = params × bits ÷ 8
        = 9 × 8.5 ÷ 8
        = 9.56 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
9.56 ≤ 12.15  → FITS
headroom  = 2.59 GB of weights budget left
Llama 3.1 8B8BQ8_0
8.5
fits
▸ show the math
// weights Q8_0 for Llama 3.1 8B (8B params)
weights = params × bits ÷ 8
        = 8 × 8.5 ÷ 8
        = 8.50 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
8.50 ≤ 12.15  → FITS
headroom  = 3.65 GB of weights budget left
Granite 8B Code8BQ8_0
8.5
fits
▸ show the math
// weights Q8_0 for Granite 8B Code (8B params)
weights = params × bits ÷ 8
        = 8 × 8.5 ÷ 8
        = 8.50 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
8.50 ≤ 12.15  → FITS
headroom  = 3.65 GB of weights budget left
Mistral 7B v0.37.2BQ8_0
7.7
fits
▸ show the math
// weights Q8_0 for Mistral 7B v0.3 (7.2B params)
weights = params × bits ÷ 8
        = 7.2 × 8.5 ÷ 8
        = 7.65 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
7.65 ≤ 12.15  → FITS
headroom  = 4.50 GB of weights budget left
Qwen 2.5 7B7BQ8_0
7.4
fits
▸ show the math
// weights Q8_0 for Qwen 2.5 7B (7B params)
weights = params × bits ÷ 8
        = 7 × 8.5 ÷ 8
        = 7.44 GB

// budget on RTX 5080 (16GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.40 GB    (runtime, cuda, allocator)
safety    = 2.40 GB    (15% of 16GB)
budget    = vram − safety − kv − overhead
          = 16 − 2.40 − 0.05 − 1.40
          = 12.15 GB

// fit decision
7.44 ≤ 12.15  → FITS
headroom  = 4.71 GB of weights budget left

Compare to…

$ ./vrambudget --compare

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.