Meta provider

Llama 3.1 405B

The big one. 810GB at FP16 puts it in DGX or multi-GPU territory. Q3-Q4 quants fit on 2x H100 NVL or M3 Ultra 512.

Parameters
405B
Family
Meta
Context
128K tokens
FP16 weights
810GB
// where you can run it
llama3.1:405bLM StudiovLLMMLXoMLX
// hugging face stats (cached daily)
230K downloads · 594 likes · license: llama3.1 · updated 1 year ago

What you need to run this.

$ ./vrambudget --model llama-3-1-405b --by quant

// budgets shown at ctx 8K, concurrency 1, 15% safety headroom. Tune in the calculator →

Alternatives at this size.

$ grep --params similar catalog.json

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.