671B MoE, 37B active. GPT-4 class on a fraction of the inference cost. Multi-node serving at FP16.
// budgets shown at ctx 8K, concurrency 1, 15% safety headroom. Tune in the calculator →
// sign in with github to leave a comment. threads live in the repo's discussions tab.