~/gpu/rtx-3060
nvidia manufacturer

RTX 3060 12GB 12GB

Entry-level Ampere. Comfortable home for 7B-8B models at Q5_K_M; 13B at Q4.

VRAM
12GB
Bandwidth
360GB/s
FP16 compute
51TFLOPS
Budget @ ctx 8K
8.3GB

Tuned to this card.

$ ./vrambudget --gpu rtx-3060
$ vrambudget --gpu rtx-3060 --ctx 8192 --conc 1 --safety 15%↗ tweetlive
ampere
RTX 3060 12GB
12GB
ampere
RTX 3070
8GB
ampere
RTX 3080
10GB
ampere
RTX 3080 Ti
12GB
ampere
RTX 3090
24GB
ampere
RTX 3090 Ti
24GB
12GB
64GB
8Ktok
12GB
device capacity
0.05GB
0.4% of total
1.3GB
10.8% of total
8.8GB
74% of total
$ budget allocation10 / 12 GB used
weightskv cacheoverheadsafety
↳ sorted by best fit
fitscomfortably runs on this budget12 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.4 GB
fits
Phi-414.7B
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.3 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.5 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.5 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.7 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.0 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
6.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
2.5 GB
fits
overneeds a bigger card, more aggressive quant, or model split18 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
12 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
15 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
15 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
17 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
19 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
20 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
26 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
40 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
41 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
59 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
66 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
79 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
228 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over

Models that fit on a RTX 3060 12GB.

$ grep "fits" models.json | head -12
ModelParamsBest quantWeights / 8.3 GB budgetFit
StarCoder2 15B15BAWQ 4-BIT
8.0
fits
▸ show the math
// weights AWQ 4-bit for StarCoder2 15B (15B params)
weights = params × bits ÷ 8
        = 15 × 4.25 ÷ 8
        = 7.97 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
7.97 ≤ 8.85  → FITS
headroom  = 0.88 GB of weights budget left
Phi-414.7BQ4_K_M
8.3
fits
▸ show the math
// weights Q4_K_M for Phi-4 (14.7B params)
weights = params × bits ÷ 8
        = 14.7 × 4.5 ÷ 8
        = 8.27 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
8.27 ≤ 8.85  → FITS
headroom  = 0.58 GB of weights budget left
Qwen 3.5 9B9BQ6_K
7.4
fits
▸ show the math
// weights Q6_K for Qwen 3.5 9B (9B params)
weights = params × bits ÷ 8
        = 9 × 6.56 ÷ 8
        = 7.38 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
7.38 ≤ 8.85  → FITS
headroom  = 1.47 GB of weights budget left
Gemma 2 9B9BQ6_K
7.4
fits
▸ show the math
// weights Q6_K for Gemma 2 9B (9B params)
weights = params × bits ÷ 8
        = 9 × 6.56 ÷ 8
        = 7.38 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
7.38 ≤ 8.85  → FITS
headroom  = 1.47 GB of weights budget left
Llama 3.1 8B8BFP8/INT8
8.0
fits
▸ show the math
// weights FP8/INT8 for Llama 3.1 8B (8B params)
weights = params × bits ÷ 8
        = 8 × 8 ÷ 8
        = 8.00 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
8.00 ≤ 8.85  → FITS
headroom  = 0.85 GB of weights budget left
Granite 8B Code8BFP8/INT8
8.0
fits
▸ show the math
// weights FP8/INT8 for Granite 8B Code (8B params)
weights = params × bits ÷ 8
        = 8 × 8 ÷ 8
        = 8.00 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
8.00 ≤ 8.85  → FITS
headroom  = 0.85 GB of weights budget left
Mistral 7B v0.37.2BQ8_0
7.7
fits
▸ show the math
// weights Q8_0 for Mistral 7B v0.3 (7.2B params)
weights = params × bits ÷ 8
        = 7.2 × 8.5 ÷ 8
        = 7.65 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
7.65 ≤ 8.85  → FITS
headroom  = 1.20 GB of weights budget left
Qwen 2.5 7B7BQ8_0
7.4
fits
▸ show the math
// weights Q8_0 for Qwen 2.5 7B (7B params)
weights = params × bits ÷ 8
        = 7 × 8.5 ÷ 8
        = 7.44 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
7.44 ≤ 8.85  → FITS
headroom  = 1.41 GB of weights budget left
Gemma 4 E4B4BFP16/BF16
8.0
fits
▸ show the math
// weights FP16/BF16 for Gemma 4 E4B (4B params)
weights = params × bits ÷ 8
        = 4 × 16 ÷ 8
        = 8.00 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
8.00 ≤ 8.85  → FITS
headroom  = 0.85 GB of weights budget left
Phi-4 Mini3.8BFP16/BF16
7.6
fits
▸ show the math
// weights FP16/BF16 for Phi-4 Mini (3.8B params)
weights = params × bits ÷ 8
        = 3.8 × 16 ÷ 8
        = 7.60 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
7.60 ≤ 8.85  → FITS
headroom  = 1.25 GB of weights budget left
Llama 3.2 3B3.21BFP16/BF16
6.4
fits
▸ show the math
// weights FP16/BF16 for Llama 3.2 3B (3.21B params)
weights = params × bits ÷ 8
        = 3.21 × 16 ÷ 8
        = 6.42 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
6.42 ≤ 8.85  → FITS
headroom  = 2.43 GB of weights budget left
Llama 3.2 1B1.23BFP16/BF16
2.5
fits
▸ show the math
// weights FP16/BF16 for Llama 3.2 1B (1.23B params)
weights = params × bits ÷ 8
        = 1.23 × 16 ÷ 8
        = 2.46 GB

// budget on RTX 3060 12GB (12GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.30 GB    (runtime, cuda, allocator)
safety    = 1.80 GB    (15% of 12GB)
budget    = vram − safety − kv − overhead
          = 12 − 1.80 − 0.05 − 1.30
          = 8.85 GB

// fit decision
2.46 ≤ 8.85  → FITS
headroom  = 6.39 GB of weights budget left

Compare to…

$ ./vrambudget --compare

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.