The big one. 810GB at FP16 puts it in DGX or multi-GPU territory. Q3-Q4 quants fit on 2x H100 NVL or M3 Ultra 512.
// budgets shown at ctx 8K, concurrency 1, 15% safety headroom. Tune in the calculator →
// sign in with github to leave a comment. threads live in the repo's discussions tab.