~/runtime/ollama

Ollama

The easiest way to run open models locally.

License

MIT

Platform

macOS · Linux · Windows · Docker

Model formats

GGUF · safetensors

API

REST :11434 · OpenAI-compatible adapters

What it is.

$ ./vrambudget --runtime ollama

Single-binary CLI with a built-in model registry. Curl-install, run a model in one command, hit it via REST on port 11434. Backed by llama.cpp under the hood; supports GGUF and safetensors. The default starting point for almost everyone running local LLMs.

Install.

$ pkg install ollama

curl -fsSL https://ollama.com/install.sh | sh

irm https://ollama.com/install.ps1 | iex

Supported platforms: macOS, Linux, Windows, Docker

Features.

$ cat features.md

Model registry

Pull from ollama.com/library by tag (e.g. `llama3.1:8b`, `qwen2.5:32b`). One command, one model, no manual download dance.

REST + SDKs

Default REST API on :11434. Official Python + JS SDKs (`pip install ollama`, `npm i ollama`).

Integrations

`ollama launch <integration>` for Claude Code, Codex, Copilot CLI, OpenCode, OpenClaw, and more.

GGUF + safetensors

llama.cpp backend reads quantized GGUF and full-precision safetensors. Quants from Q4_K_M up to FP16.

Best for

▸First-time local LLM users
▸Quick model swapping via `ollama run <name>`
▸Cross-platform setups (same workflow on Mac, Linux, Windows)
▸Integrations: Claude Code, OpenClaw, Codex, Copilot CLI all read Ollama on :11434

Caveats

▸Single-tier in-memory KV cache; mid-session context shifts cause recomputation
▸No paged attention; throughput trails vLLM under heavy concurrency
▸GUI is minimal; pair with Open WebUI or one of the community clients for chat

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.

Ollama

What it is.

Install.

Features.

Links.

Compare to…

Discussion.