limit / n
Cap the number of rows returned. Use small values for dashboards and top-k schedulers.
REST API
`llmfit serve` publishes the same hardware-aware fit analysis used by the TUI and CLI. This is the right surface when you want schedulers, agents, dashboards, or internal tools to consume recommendations programmatically.
llmfit serve --host 0.0.0.0 --port 8787
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Liveness probe for node-local API health. |
| GET | /api/v1/system |
Detected node identity and hardware profile. |
| GET | /api/v1/models |
Filtered and sorted model-fit rows for the current node. |
| GET | /api/v1/models/top |
Top runnable models with conservative top-k semantics. |
| GET | /api/v1/models/{name} |
Path-constrained search for a model family. |
Recommended query parameters
limit / nCap the number of rows returned. Use small values for dashboards and top-k schedulers.
min_fitUse `good` for production-style defaults and `marginal` when exploring stretch options.
use_caseTarget coding, chat, reasoning, embedding, or other workload-specific categories.
runtimeRestrict results when your environment standardizes on a specific runtime such as MLX or llama.cpp.
Integration patterns
/health./api/v1/system./api/v1/models/top?min_fit=good&limit=5.Let an internal agent query the node-local API before selecting a model or provisioning a workload on a machine.
Expose the top recommendations in a small operations UI so teams stop asking ad hoc fit questions in chat.
curl http://127.0.0.1:8787/health
curl http://127.0.0.1:8787/api/v1/system
curl "http://127.0.0.1:8787/api/v1/models?limit=20&min_fit=marginal&sort=score"
curl "http://127.0.0.1:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"
curl "http://127.0.0.1:8787/api/v1/models/Mistral?runtime=any"
For conservative production placement, prefer `min_fit=good`, `include_too_tight=false`, `sort=score`, and a small `limit` value.
Need deployment details?