~/gpu/rx-7900-xtx
amd manufacturer

RX 7900 XTX 24GB

24GB AMD consumer flagship. ROCm path improving. Same model fits as 3090/4090.

VRAM
24GB
Bandwidth
960GB/s
FP16 compute
122TFLOPS
Budget @ ctx 8K
18GB

Tuned to this card.

$ ./vrambudget --gpu rx-7900-xtx
$ vrambudget --gpu rx-7900-xtx --ctx 8192 --conc 1 --safety 15%↗ tweetlive
rdna3
RX 7900 XTX
24GB
rdna3 · workstation
Radeon Pro W7900
48GB
cdna3 · datacenter
MI300X
192GB
24GB
64GB
8Ktok
24GB
device capacity
0.05GB
0.2% of total
1.6GB
6.7% of total
19GB
78% of total
$ budget allocation20 / 24 GB used
weightskv cacheoverheadsafety
↳ sorted by best fit
fitscomfortably runs on this budget21 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
19 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
17 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
19 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
17 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
17 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
Phi-414.7B
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
18 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
16 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
14 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
8.0 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
7.6 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
6.4 GB
fits
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
2.5 GB
fits
overneeds a bigger card, more aggressive quant, or model split9 models
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
26 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
40 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
41 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
59 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
66 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
79 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
228 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over
FP16/BF16FP8/INT8Q8_0Q6_KQ5_K_MQ4_K_MQ3_K_MAWQ 4-bitGPTQ 4-bit
377 GB
over

Models that fit on a RX 7900 XTX.

$ grep "fits" models.json | head -12
ModelParamsBest quantWeights / 18 GB budgetFit
Qwen 3.6 35B A3B35BQ3_K_M
15
fits
▸ show the math
// weights Q3_K_M for Qwen 3.6 35B A3B (35B params)
weights = params × bits ÷ 8
        = 35 × 3.44 ÷ 8
        = 15.05 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
15.05 ≤ 18.75  → FITS
headroom  = 3.70 GB of weights budget left
Yi 34B34BQ3_K_M
15
fits
▸ show the math
// weights Q3_K_M for Yi 34B (34B params)
weights = params × bits ÷ 8
        = 34 × 3.44 ÷ 8
        = 14.62 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
14.62 ≤ 18.75  → FITS
headroom  = 4.13 GB of weights budget left
Qwen 2.5 32B32.5BAWQ 4-BIT
17
fits
▸ show the math
// weights AWQ 4-bit for Qwen 2.5 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 4.25 ÷ 8
        = 17.27 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
17.27 ≤ 18.75  → FITS
headroom  = 1.48 GB of weights budget left
Qwen 2.5 Coder 32B32.5BAWQ 4-BIT
17
fits
▸ show the math
// weights AWQ 4-bit for Qwen 2.5 Coder 32B (32.5B params)
weights = params × bits ÷ 8
        = 32.5 × 4.25 ÷ 8
        = 17.27 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
17.27 ≤ 18.75  → FITS
headroom  = 1.48 GB of weights budget left
Qwen3 30B A3B30.5BQ4_K_M
17
fits
▸ show the math
// weights Q4_K_M for Qwen3 30B A3B (30.5B params)
weights = params × bits ÷ 8
        = 30.5 × 4.5 ÷ 8
        = 17.16 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
17.16 ≤ 18.75  → FITS
headroom  = 1.59 GB of weights budget left
Qwen 3.6 27B27BQ4_K_M
15
fits
▸ show the math
// weights Q4_K_M for Qwen 3.6 27B (27B params)
weights = params × bits ÷ 8
        = 27 × 4.5 ÷ 8
        = 15.19 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
15.19 ≤ 18.75  → FITS
headroom  = 3.56 GB of weights budget left
Gemma 4 26B A4B26BQ5_K_M
18
fits
▸ show the math
// weights Q5_K_M for Gemma 4 26B A4B (26B params)
weights = params × bits ÷ 8
        = 26 × 5.5 ÷ 8
        = 17.88 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
17.88 ≤ 18.75  → FITS
headroom  = 0.87 GB of weights budget left
Mistral Small 324BQ5_K_M
17
fits
▸ show the math
// weights Q5_K_M for Mistral Small 3 (24B params)
weights = params × bits ÷ 8
        = 24 × 5.5 ÷ 8
        = 16.50 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
16.50 ≤ 18.75  → FITS
headroom  = 2.25 GB of weights budget left
gpt-oss 20B20.9BQ6_K
17
fits
▸ show the math
// weights Q6_K for gpt-oss 20B (20.9B params)
weights = params × bits ÷ 8
        = 20.9 × 6.56 ÷ 8
        = 17.14 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
17.14 ≤ 18.75  → FITS
headroom  = 1.61 GB of weights budget left
StarCoder2 15B15BQ8_0
16
fits
▸ show the math
// weights Q8_0 for StarCoder2 15B (15B params)
weights = params × bits ÷ 8
        = 15 × 8.5 ÷ 8
        = 15.94 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
15.94 ≤ 18.75  → FITS
headroom  = 2.81 GB of weights budget left
Phi-414.7BQ8_0
16
fits
▸ show the math
// weights Q8_0 for Phi-4 (14.7B params)
weights = params × bits ÷ 8
        = 14.7 × 8.5 ÷ 8
        = 15.62 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
15.62 ≤ 18.75  → FITS
headroom  = 3.13 GB of weights budget left
Qwen 3.5 9B9BFP16/BF16
18
fits
▸ show the math
// weights FP16/BF16 for Qwen 3.5 9B (9B params)
weights = params × bits ÷ 8
        = 9 × 16 ÷ 8
        = 18.00 GB

// budget on RX 7900 XTX (24GB) at ctx 8K, conc 1, 15% safety
kv_cache  = 0.05 GB    (1× at ctx 8K)
overhead  = 1.60 GB    (runtime, cuda, allocator)
safety    = 3.60 GB    (15% of 24GB)
budget    = vram − safety − kv − overhead
          = 24 − 3.60 − 0.05 − 1.60
          = 18.75 GB

// fit decision
18.00 ≤ 18.75  → FITS
headroom  = 0.75 GB of weights budget left

Compare to…

$ ./vrambudget --compare

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.