~/gpu/gaudi-3 vs h100

Gaudi 3 manufacturerGaudi 3vsH100 80GB manufacturerH100 80GB

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs gaudi-3 h100
Stat
gaudi-3
h100
Δ
VRAM
128 GB
80 GB
-38%
Memory bandwidth
3,700 GB/s
3,350 GB/s
-9%
FP16 compute
1835 TFLOPS
989 TFLOPS
-46%
Weights budget at 8K ctx
102 GB
63 GB
-38%

Model fit difference.

$ models that change with the card
Fits on both
27of 30
Only on gaudi-3
0
Only on h100
0

// showing 12 of 30 models; differing fits first

Model
gaudi-3
h100
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsQ8_0
fitsQ6_K
overQ4_K_M
overQ4_K_M
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsQ8_0
fitsFP16/BF16
fitsQ8_0
fitsQ8_0
fitsQ6_K
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16
fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload
More VRAM headroom

Gaudi 3 has 48 GB more.

Faster decode (bandwidth)

Gaudi 3 by +10%.

Faster prefill (compute)

Gaudi 3 by +86% TFLOPS.

Catalog models that fit

Tied: 27 of 30 fit on each.

Drill into either card.

$ ./vrambudget --gpu

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.