@meta
  v: 1
  route: /runtime/omlx
  generated: 2026-06-10T09:17:52.809Z

@intent
  purpose:    Install, configure, and serve LLMs with oMLX.
  audience:   self-hoster, ai-engineer, mac-user, devops-engineer
  capability: install, serve_local_llms, compare_runtimes, open_external_docs

@state
  slug: omlx
  name: oMLX
  family: apple-silicon
  type: server
  license: Apache 2.0
  primary_platform: macOS Sequoia · M1 / M2 / M3 / M4 / M5
  platforms[1]: macOS 15+ (Apple Silicon)
  model_formats[1]: MLX
  api_compatibility[4]: OpenAI /v1/chat/completions, Anthropic /v1/messages, Embeddings, Rerank
  install_command: brew tap jundot/omlx https://github.com/jundot/omlx && brew install omlx
  install_secondary: Or download the signed .dmg from omlx.ai (menu-bar app)
  homepage_url: https://omlx.ai
  github_url: https://github.com/jundot/omlx
  docs_url: https://github.com/jundot/omlx#readme
  feature_count: 6
  feature_labels[6]: Paged SSD KV cache, Continuous batching, OpenAI + Anthropic API, Multi-model serving, Tool calling + MCP, Native menu-bar app
  best_for[4]: "Agentic coding on a Mac: Claude Code, OpenClaw, Cursor with sub-5s TTFT after the first turn", Anyone with a Mac running long agent sessions where context shifts often, Multi-model serving (LLM + VLM + embedding + reranker simultaneously), Drop-in replacement for cloud APIs while staying private
  caveats[3]: macOS Sequoia (15+) and Apple Silicon only; no Linux or Intel Mac path, MLX-format models only; bring your own from huggingface.co/mlx-community, "Reuses LM Studio model directories if you already have them, but does NOT auto-import from llama.cpp / GGUF"

@actions
  - id: open_homepage
    method: GET
    href: https://omlx.ai
  - id: open_github
    method: GET
    href: https://github.com/jundot/omlx
  - id: open_docs
    method: GET
    href: https://github.com/jundot/omlx#readme
  - id: view_index
    method: GET
    href: /runtime/
  - id: view_calculator
    method: GET
    href: /#calculator

@context
  > Native macOS inference server built on MLX. Solves the biggest pain in agentic local inference: when the KV cache invalidates mid-session (which it does constantly with coding agents), oMLX restores cached prefix blocks from SSD in milliseconds instead of recomputing from scratch. TTFT from 30-90s down to under 5s on the second turn. Drop-in for Claude Code, OpenClaw, Cursor. Native menu-bar app, not Electron. Apache 2.0.

@nav
  self:      /runtime/omlx
  parents:   [/, /runtime/]
  peers:     [/runtime/ollama, /runtime/lm-studio, /runtime/vllm, /runtime/mlx]
  drilldown: /#calculator
