REST API

Expose node-local fit analysis over HTTP.

`llmfit serve` publishes the same hardware-aware fit analysis used by the TUI and CLI. This is the right surface when you want schedulers, agents, dashboards, or internal tools to consume recommendations programmatically.

llmfit serve --host 0.0.0.0 --port 8787

Method	Path	Purpose
GET	`/health`	Liveness probe for node-local API health.
GET	`/api/v1/system`	Detected node identity and hardware profile.
GET	`/api/v1/models`	Filtered and sorted model-fit rows for the current node.
GET	`/api/v1/models/top`	Top runnable models with conservative top-k semantics.
GET	`/api/v1/models/{name}`	Path-constrained search for a model family.

Recommended query parameters

Control how strict the recommendations should be.

`limit` / `n`

Cap the number of rows returned. Use small values for dashboards and top-k schedulers.

`min_fit`

Use `good` for production-style defaults and `marginal` when exploring stretch options.

`use_case`

Target coding, chat, reasoning, embedding, or other workload-specific categories.

`runtime`

Restrict results when your environment standardizes on a specific runtime such as MLX or llama.cpp.

Integration patterns

How teams usually wire it in.

Scheduler polling

Call /health.
Capture /api/v1/system.
Query /api/v1/models/top?min_fit=good&limit=5.
Attach node metadata in your own control plane.

Agent tool use

Let an internal agent query the node-local API before selecting a model or provisioning a workload on a machine.

Internal portal

Expose the top recommendations in a small operations UI so teams stop asking ad hoc fit questions in chat.

Quick start

curl http://127.0.0.1:8787/health
curl http://127.0.0.1:8787/api/v1/system
curl "http://127.0.0.1:8787/api/v1/models?limit=20&min_fit=marginal&sort=score"

Workload targeting

curl "http://127.0.0.1:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"
curl "http://127.0.0.1:8787/api/v1/models/Mistral?runtime=any"

Operational default

For conservative production placement, prefer `min_fit=good`, `include_too_tight=false`, `sort=score`, and a small `limit` value.

Need deployment details?

Pair serve mode with your preferred runtime or hosting pattern.

Self-hosting guide Read the docs

Expose node-local fit analysis over HTTP.

Control how strict the recommendations should be.

limit / n

min_fit

use_case

runtime

How teams usually wire it in.

Scheduler polling

Agent tool use

Internal portal

Quick start

Workload targeting

Pair serve mode with your preferred runtime or hosting pattern.

`limit` / `n`

`min_fit`

`use_case`

`runtime`