~/gpu/rtx-3090 vs rtx-4080

RTX 3090 manufacturerRTX 3090vsRTX 4080 manufacturerRTX 4080

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs rtx-3090 rtx-4080
Stat
rtx-3090
rtx-4080
Δ
VRAM
24 GB
16 GB
-33%
Memory bandwidth
936 GB/s
717 GB/s
-23%
FP16 compute
142 TFLOPS
390 TFLOPS
+175%
Weights budget at 8K ctx
18 GB
12 GB
-33%

Model fit difference.

$ models that change with the card
Fits on both
16of 30
Only on rtx-3090
5
Only on rtx-4080
0

// showing 12 of 30 models; differing fits first

Model
rtx-3090
rtx-4080
fitsAWQ 4-BIT
overQ4_K_M
fitsAWQ 4-BIT
overQ4_K_M
fitsQ4_K_M
overQ4_K_M
fitsQ3_K_M
overQ4_K_M
fitsQ3_K_M
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsQ8_0
overQ4_K_M
overQ4_K_M
overQ4_K_M
overQ4_K_M
fitsFP16/BF16
fitsQ8_0
overQ4_K_M
overQ4_K_M

Which one wins for…

$ ./recommend --by-workload
More VRAM headroom

RTX 3090 has 8 GB more.

Faster decode (bandwidth)

RTX 3090 by +31%.

Faster prefill (compute)

RTX 4080 by +175% TFLOPS.

Catalog models that fit

RTX 3090: 21 fit · RTX 4080: 16.

Drill into either card.

$ ./vrambudget --gpu

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.