Documentation

How LLMFit thinks about local model fit.

LLMFit is a hardware-aware advisor. It does not replace inference runtimes like Ollama or llama.cpp. It decides which models, quantizations, and run paths make sense for a machine before you commit to a download or deployment.

569 catalog entries embedded in the current shipped database

142 provider names represented in the catalog metadata

2 filterable capability flags currently exposed in the catalog: tool use and vision

Core model

What the engine evaluates.

Hardware shape

Total RAM, available RAM, CPU cores, GPU count, VRAM, backend, unified memory, and provider installation state.

Model shape

Parameter count, context length, release date, MoE behavior, quantization options, and provider availability.

Fit dimensions

Fit, speed, context, and quality scores are combined into a recommendation ranking rather than relying on a single benchmark.

Run path

GPU, CPU offload, CPU-only, and unified memory scenarios are treated differently because practical usability changes fast across them.

Fit levels

Operational labels, not marketing labels.

Perfect

The model fits comfortably and is positioned as a high-confidence local choice for the current hardware.

Good

Runnable with sensible resource usage. Good is often the right production default for conservative placement.

Marginal

Technically runnable, but with tighter memory pressure or a weaker expected throughput profile.

Too tight

Included for analysis, but usually excluded from top scheduling endpoints because it should not be treated as a safe deployment target.

Interfaces

One engine, multiple operational surfaces.

Surface	Best for	Typical entrypoint
TUI	Interactive exploration, filtering, model comparison, plan mode	`llmfit`
CLI	Scripting, JSON export, repeatable audits, batch checks	`llmfit recommend --json`
REST API	Schedulers, agent integrations, internal platforms, node-local services	`llmfit serve --host 0.0.0.0 --port 8787`
Desktop	macOS users who want the same logic with a graphical wrapper	`cargo tauri build`

Runtime coverage

What LLMFit works with.

Ollama Detection, installed-model awareness, and pull-oriented workflows.

llama.cpp / GGUF Quantization-aware fit analysis for common local CPU and GPU paths.

MLX Apple Silicon-friendly model handling and throughput estimation.

Docker Model Runner Container-friendly runtime coverage for teams standardizing local stacks.

Repository layout

What ships in this codebase.

`llmfit-core/`

Hardware detection, model catalogs, fit scoring, planning logic, and provider integration.

`llmfit-tui/`

The terminal UI, classic CLI commands, and REST API entrypoints.

`llmfit-desktop/`

The Tauri-based macOS desktop wrapper for users who want the same engine with a GUI shell.

`site/`

The product site and docs property that can be served independently from the runtime tooling.

Next step

Choose the path that matches your evaluation stage.

Study the API See how it compares