Llama 3.1 405B

The big one. 810GB at FP16 puts it in DGX or multi-GPU territory. Q3-Q4 quants fit on 2x H100 NVL or M3 Ultra 512.

Parameters

405B

Family

What you need to run this.

$ ./vrambudget --model llama-3-1-405b --by quant

// budgets shown at ctx 8K, concurrency 1, 15% safety headroom. Tune in the calculator →

$ grep --params similar catalog.json

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.