~/gpu/gaudi-3 vs h100

Gaudi 3vsH100 80GB

Head-to-head for local LLM inference. The honest comparison: VRAM, bandwidth, compute, and which of the 30 catalog models actually fit on each.

The specs.

$ diff specs gaudi-3 h100

Stat

gaudi-3

h100

VRAM

128 GB

80 GB

-38%

Memory bandwidth

3,700 GB/s

3,350 GB/s

-9%

FP16 compute

1835 TFLOPS

989 TFLOPS

-46%

Weights budget at 8K ctx

102 GB

63 GB

-38%

Model fit difference.

$ models that change with the card

Fits on both

27of 30

Only on gaudi-3

Only on h100

// showing 12 of 30 models; differing fits first

Model

gaudi-3

h100

fitsFP16/BF16

fitsFP16/BF16

fitsFP16/BF16

fitsQ8_0

fitsQ6_K

Llama 3.1 405B405B

overQ4_K_M

Qwen 2.5 7B7B

fitsFP16/BF16

Qwen 2.5 32B32.5B

fitsFP16/BF16

fitsQ8_0

Qwen 2.5 Coder 32B32.5B

fitsFP16/BF16

fitsQ8_0

Qwen 2.5 72B72.7B

fitsQ8_0

fitsQ6_K

Qwen3 30B A3B30.5B

fitsFP16/BF16

Qwen 3.5 9B9B

fitsFP16/BF16

Qwen 3.6 27B27B

fitsFP16/BF16

Which one wins for…

$ ./recommend --by-workload

More VRAM headroom

Gaudi 3 has 48 GB more.

Faster decode (bandwidth)

Gaudi 3 by +10%.

Faster prefill (compute)

Gaudi 3 by +86% TFLOPS.

Catalog models that fit

Tied: 27 of 30 fit on each.

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

Gaudi 3 manufacturerGaudi 3vsH100 80GB manufacturerH100 80GB

The specs.

Model fit difference.

Which one wins for…

Drill into either card.

Discussion.

Gaudi 3vsH100 80GB