model-fit — Find the Local LLMs That Actually Run on Your Hardware

Client

Open Source — Personal Project

Timeline

2 Weeks

Role

Creator & Maintainer

Year

2026

Participants

• 1 Developer

Project Demo

Hardware-Aware LLM Compatibility Engine

model-fit is an open-source command-line tool, published on npm, that tells you which local LLMs will actually run on your machine before you waste time downloading them. It reads your CPU, GPU and RAM, then for every model computes the real memory budget (weights + KV cache sized to your context length + overhead) to classify each model as full-GPU, hybrid, CPU-only, or won't-run — with an estimated tokens/sec and the reasoning behind every number. No opaque scores, just transparent VRAM math.

model-fit hardware detection and model recommendations

model-fit memory breakdown and tokens-per-second estimate

The Challenge

Anyone running local LLMs hits the same wall: models that crash with out-of-memory errors or crawl at 2 tokens/sec. Existing tools hand out vague compatibility scores that ignore the single biggest variable — the KV cache, which grows with context length and can dwarf the model weights themselves. Worse, naive estimators wrongly flag modern GQA architectures like Llama 3 and Qwen2.5 as "won't run" at long context. The challenge was to model real memory behaviour accurately across heterogeneous hardware (Windows, macOS, Linux) and a constantly changing model catalog, while keeping the tool fast and fully usable offline.

The Solution

I built model-fit as a zero-config Node.js CLI that runs instantly via npx model-fit. It detects hardware through the systeminformation library, then derives every figure from first principles: weights = params × bytes-per-weight for the chosen quantization, the KV cache sized to the actual context window and the model's real attention architecture (GQA-aware), and generation speed approximated from memory bandwidth ÷ active weights — accounting for MoE models that only read their active experts. The catalog merges three sources: a curated offline seed list, the user's local Ollama models with real on-disk sizes, and the Hugging Face Hub GGUF catalog ranked by popularity — all de-duped and cached for 24 hours so it stays fast and works offline. Commands include detect, recommend (with category filters like coding, reasoning and vision) and check, which prints a full memory breakdown for a single model. Shipped as an MIT-licensed package.

Other remarkable projects

Visual Blueprint AI — Prompt Generation

Reverse-engineer any creative image into a structured prompt-ready blueprint with layout logic and style analysis.

Next.jsAIPrompt Engineering

View Project

Kontakly — AI-Powered B2B Lead Generation

AI-powered B2B sales automation platform that finds, qualifies, and contacts leads automatically with personalized email sequences and automated follow-ups.

PythonNext.jsAI/MLSales Automation

View Project

Veritas — Judiciary AI Assistant

AI-powered judiciary assistant for Cameroon with document library, identification services, and LLM chatbot trained on Cameroon Constitution.

Next.jsTypeScriptLLMAI/MLMobile App

View Project

Technologies Used

TypeScriptNode.jsCommander.jssysteminformationHugging Face Hub APIOllamallama.cpp / GGUFCLI

Project Links

Project Demo GitHub Repository