No description

C++ 61%
C 16.8%
Makefile 5.9%
Cuda 4.6%
Shell 4%
Other 7.7%

Find a file

Davide Eynard 7fca8b2a27 Some checks are pending CI / ubuntu-focal-make (push) Waiting to run Details Updated to v0.10.3 (#991 )		2026-06-02 19:03:37 +01:00
.github	Docs Updates (#949 )	2026-04-30 12:01:13 +01:00
.llamafile_plugin	Update llama.cpp submodule to dbe9c0c (+ embed real web UI) (#983 )	2026-05-29 00:36:43 +01:00
build	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
diffusionfile	Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988 ) (#989 )	2026-06-02 13:16:31 +01:00
docs	Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988 ) (#989 )	2026-06-02 13:16:31 +01:00
llama.cpp@dbe9c0c8ce	Update llama.cpp submodule to dbe9c0c (+ embed real web UI) (#983 )	2026-05-29 00:36:43 +01:00
llama.cpp.patches	Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988 ) (#989 )	2026-06-02 13:16:31 +01:00
llamafile	Updated to v0.10.3 (#991 )	2026-06-02 19:03:37 +01:00
localscore	disable for ascii 0	2025-04-09 16:11:42 -07:00
models	github: add ci (#454 )	2024-05-29 00:24:34 -07:00
scripts	Migrate docs from MkDocs/GitHub Pages to GitBook (#946 )	2026-04-16 10:32:30 -05:00
stable-diffusion.cpp@baf7eda1e4	Modernise Diffusionfile Support (#970 )	2026-05-28 12:12:53 +01:00
stable-diffusion.cpp.patches	Modernise Diffusionfile Support (#970 )	2026-05-28 12:12:53 +01:00
tests	Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988 ) (#989 )	2026-06-02 13:16:31 +01:00
third_party	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
tools	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
whisper.cpp@2eeeba56e9	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
whisper.cpp.patches	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
whisperfile	Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988 ) (#989 )	2026-06-02 13:16:31 +01:00
.gitbook-branch-readme.md	Migrate docs from MkDocs/GitHub Pages to GitBook (#946 )	2026-04-16 10:32:30 -05:00
.gitbook.yaml	docs: rename example_llamfiles to pre-built-llamafiles for better seo (#972 )	2026-05-20 17:26:44 +01:00
.gitignore	Migrate docs from MkDocs/GitHub Pages to GitBook (#946 )	2026-04-16 10:32:30 -05:00
.gitmodules	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
CONTRIBUTING.md	Docs Updates (#949 )	2026-04-30 12:01:13 +01:00
cosmocc-override.cmake	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00
LICENSE	Add known issue for Windows	2023-11-19 17:57:43 -08:00
Makefile	Update release scripts (#990 )	2026-06-02 13:04:44 +01:00
README.md	docs: rename example_llamfiles to pre-built-llamafiles for better seo (#972 )	2026-05-20 17:26:44 +01:00
README_0.10.0.md	docs: rename example_llamfiles to pre-built-llamafiles for better seo (#972 )	2026-05-20 17:26:44 +01:00
RELEASE.md	llamafile reloaded (v0.10.0) (#867 )	2026-03-19 11:13:53 +00:00

README.md

llamafile

[line drawing of llama animal head in front of slightly open manilla folder filled with files]

llamafile lets you distribute and run LLMs with a single file.

llamafile is a Mozilla Builders project (see its announcement blog post), now revamped by Mozilla.ai.

Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most operating systems and CPU archiectures, with no installation.

llamafile also includes whisperfile, a single-file speech-to-text tool built on whisper.cpp and the same Cosmopolitan packaging. It supports transcription and translation of audio files across all the same platforms, with no installation required.

v0.10.*

llamafile versions starting from 0.10.0 use a new build system, aimed at keeping our code more easily aligned with the latest versions of llama.cpp. This means they support more recent models and functionalities, but at the same time they might be missing some of the features you were accustomed to (check out this doc for a high-level description of what has been done). If you liked the "classic experience" more, you will always be able to access the previous versions from our releases page. Our pre-built llamafiles always show which version of the server they have been bundled with (0.9.* example, 0.10.* example), so you will always know which version of the software you are downloading.

We want to hear from you! Whether you are a new user or a long-time fan, please share what you find most valuable about llamafile and what would make it more useful for you. Read more via the blog and add your voice to the discussion here.

Quick Start

Download and run your first llamafile in minutes:

# Download an example model (Qwen3.5 0.8B)
curl -LO https://huggingface.co/mozilla-ai/llamafile_0.10/resolve/main/Qwen3.5-0.8B-Q8_0.llamafile

# Make it executable (macOS/Linux/BSD)
chmod +x Qwen3.5-0.8B-Q8_0.llamafile

# Run it
./Qwen3.5-0.8B-Q8_0.llamafile

We chose this model because that's the smallest one we have built a llamafile for, so most likely to work out-of-the-box for you. If you have powerful hardware and/or GPUs, feel free to choose larger and more expressive models which should provide more accurate responses.

Windows users: Rename the file to add .exe extension before running.

Note - Only executables under 4GB can run on Windows, so any llamafile above 4GB won't work. Download the llamafile binary and run it with any external weights/models(GGUF).

Documentation

Check the full documentation at docs.mozilla.ai/llamafile, or directly jump into one of the following subsections:

Licensing

While the llamafile project is Apache 2.0-licensed, our changes to llama.cpp and whisper.cpp are licensed under MIT (just like the projects themselves) so as to remain compatible and upstreamable in the future, should that be desired.

The llamafile logo on this page was generated with the assistance of DALL·E 3.