~/gpu/rtx-4080 vs rtx-4090

RTX 4080vsRTX 4090

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs rtx-4080 rtx-4090

Stat

rtx-4080

rtx-4090

VRAM

16 GB

24 GB

+50%

Memory bandwidth

717 GB/s

1,008 GB/s

+41%

FP16 compute

390 TFLOPS

330 TFLOPS

-15%

Weights budget at 8K ctx

12 GB

18 GB

+50%

Model fit difference.

$ models that change with the card

Fits on both

16of 30

Only on rtx-4080

Only on rtx-4090

// showing 12 of 30 models; differing fits first

Model

rtx-4080

rtx-4090

Qwen 2.5 32B32.5B

overQ4_K_M

fitsAWQ 4-BIT

Qwen 2.5 Coder 32B32.5B

overQ4_K_M

fitsAWQ 4-BIT

Qwen3 30B A3B30.5B

overQ4_K_M

fitsQ4_K_M

Qwen 3.6 35B A3B35B

overQ4_K_M

fitsQ3_K_M

Yi 34B34B

overQ4_K_M

fitsQ3_K_M

Llama 3.2 1B1.23B

fitsFP16/BF16

Llama 3.2 3B3.21B

fitsFP16/BF16

Llama 3.1 8B8B

fitsQ8_0

fitsFP16/BF16

Llama 3.3 70B70.6B

overQ4_K_M

Llama 3.1 405B405B

overQ4_K_M

Qwen 2.5 7B7B

fitsQ8_0

fitsFP16/BF16

Qwen 2.5 72B72.7B

overQ4_K_M

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

RTX 4090 has 8 GB more.

Faster decode (bandwidth)

RTX 4090 by +41%.

Faster prefill (compute)

RTX 4080 by +18% TFLOPS.

Catalog models that fit

RTX 4090: 21 fit · RTX 4080: 16.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

RTX 4080 manufacturerRTX 4080vsRTX 4090 manufacturerRTX 4090

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

RTX 4080vsRTX 4090