DeepSeek

DeepSeek R1

V3 plus reasoning post-training. The model that proved RL-from-scratch chain-of-thought works at scale.

Parameters

671B

Family

DeepSeek

Context

128K tokens

FP16 weights

1342GB

// where you can run it

// hugging face stats (cached daily)

4.2M downloads · 13K likes · license: mit · updated 1 year ago

What you need to run this.

$ ./vrambudget --model deepseek-r1 --by quant

// budgets shown at ctx 8K, concurrency 1, 15% safety headroom. Tune in the calculator →

$ grep --params similar catalog.json

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.