~/runtime/ollama
Ollama brand

Ollama

The easiest way to run open models locally.

License
MIT
Platform
macOS · Linux · Windows · Docker
Model formats
GGUF · safetensors
API
REST :11434 · OpenAI-compatible adapters

What it is.

$ ./vrambudget --runtime ollama

Single-binary CLI with a built-in model registry. Curl-install, run a model in one command, hit it via REST on port 11434. Backed by llama.cpp under the hood; supports GGUF and safetensors. The default starting point for almost everyone running local LLMs.

Install.

$ pkg install ollama
curl -fsSL https://ollama.com/install.sh | sh
irm https://ollama.com/install.ps1 | iex

Supported platforms: macOS, Linux, Windows, Docker

Features.

$ cat features.md
Model registry

Pull from ollama.com/library by tag (e.g. `llama3.1:8b`, `qwen2.5:32b`). One command, one model, no manual download dance.

REST + SDKs

Default REST API on :11434. Official Python + JS SDKs (`pip install ollama`, `npm i ollama`).

Integrations

`ollama launch <integration>` for Claude Code, Codex, Copilot CLI, OpenCode, OpenClaw, and more.

GGUF + safetensors

llama.cpp backend reads quantized GGUF and full-precision safetensors. Quants from Q4_K_M up to FP16.

Best for
  • First-time local LLM users
  • Quick model swapping via `ollama run <name>`
  • Cross-platform setups (same workflow on Mac, Linux, Windows)
  • Integrations: Claude Code, OpenClaw, Codex, Copilot CLI all read Ollama on :11434
Caveats
  • Single-tier in-memory KV cache; mid-session context shifts cause recomputation
  • No paged attention; throughput trails vLLM under heavy concurrency
  • GUI is minimal; pair with Open WebUI or one of the community clients for chat

Links.

$ ls -1 ./external
↗ homepagehttps://ollama.com↗ githubhttps://github.com/ollama/ollama↗ docshttps://github.com/ollama/ollama/blob/main/docs/README.md

Compare to…

$ ./vrambudget --compare-runtimes

Discussion.

$ gh discussion list

// sign in with github to leave a comment. threads live in the repo's discussions tab.