Single-binary CLI with a built-in model registry. Curl-install, run a model in one command, hit it via REST on port 11434. Backed by llama.cpp under the hood; supports GGUF and safetensors. The default starting point for almost everyone running local LLMs.
Supported platforms: macOS, Linux, Windows, Docker
Pull from ollama.com/library by tag (e.g. `llama3.1:8b`, `qwen2.5:32b`). One command, one model, no manual download dance.
Default REST API on :11434. Official Python + JS SDKs (`pip install ollama`, `npm i ollama`).
`ollama launch <integration>` for Claude Code, Codex, Copilot CLI, OpenCode, OpenClaw, and more.
llama.cpp backend reads quantized GGUF and full-precision safetensors. Quants from Q4_K_M up to FP16.
// sign in with github to leave a comment. threads live in the repo's discussions tab.