github__hanxiao__omni-macos

mirror of https://github.com/hanxiao/omni-macos.git synced 2026-06-19 00:21:03 -06:00

Native macOS semantic search over your local files - text, images, audio, video in one vector space, on-device on Apple silicon. https://hanxiao.io/omni

Swift 90.8%
Python 3.8%
HTML 3.3%
TypeScript 1.2%
Shell 0.8%

Find a file

github-actions[bot] cf37eab77e chore: release v0.3.2 [skip ci]		2026-06-14 11:26:35 -07:00
.github/workflows	ci: bump actions/checkout to v5 (Node 24 runners from June 16)	2026-06-11 18:58:01 -07:00
App	settings: move the file-type disable dialog to the Indexing tab where the toggles live	2026-06-13 23:52:41 -07:00
Fixtures	Default to Nano variant + make the Nano model bit-exact vs the reference	2026-06-02 22:49:19 -07:00
Scripts	build: preserve the dev team when auto-regenerating the xcodeproj	2026-06-11 08:21:45 -07:00
site/omni	site: video block - true 16:9 aspect, no divider lines	2026-06-10 09:16:05 -07:00
Sources	bench: add mediamem - prove streamed video indexing memory is bounded, not file-size-bound (#9 )	2026-06-14 00:25:40 -07:00
Tests/OmniKitTests	crawl: per-kind file-size cap - stop silently skipping multi-GB video/audio (#9 )	2026-06-14 00:12:13 -07:00
Tools	feat: profiling v2 - 300-file dataset, interactive version-vs-throughput chart	2026-06-09 17:16:15 -07:00
worker	fix(worker): accept profiling-v2 uploads, store the run's actual dataset version	2026-06-09 17:36:57 -07:00
.gitignore	repo: untrack site-assets scratch screenshots (8.6MB, duplicates of published site assets)	2026-06-09 20:48:41 -07:00
CLAUDE.md	site: keep Features at 4 cards (lean); note the constraint in CLAUDE.md	2026-06-08 13:45:47 -07:00
DESIGN_REVIEW.md	HTTP serving, batch-N media + bucketing, settings/UI overhaul	2026-06-05 08:58:56 -07:00
LICENSE	docs: rewrite README, add Apache 2.0 license, externalize Team ID	2026-06-06 18:41:46 -07:00
Makefile	make test: fix test harness (tokenizers artifact + ad-hoc sign bundle)	2026-06-06 19:30:24 -07:00
Package.resolved	HTTP serving, batch-N media + bucketing, settings/UI overhaul	2026-06-05 08:58:56 -07:00
Package.swift	HTTP serving, batch-N media + bucketing, settings/UI overhaul	2026-06-05 08:58:56 -07:00
project.yml	chore: release v0.3.2 [skip ci]	2026-06-14 11:26:35 -07:00
README.md	readme: move Build from source below Architecture, before License	2026-06-11 23:38:27 -07:00

README.md

Omni

Semantic search over your local files, running entirely on-device.

Download for macOS →

Omni indexes your files and lets you search them by meaning instead of filename. A text query finds matching documents, code, PDFs, images, audio, and video together, because everything is embedded into one shared vector space. The model runs in-process on Apple GPUs via a native MLX-Swift port of jina-embeddings-v5-omni, in two sizes - Nano (~1.9 GB) and Small (~3.1 GB). No Python, no server, no cloud: the model downloads once, then indexing and search run with no network at all. Airgap the Mac and Omni keeps working.

▶️ Watch the 37-second demo

Install

Download the latest DMG from hanxiao.io/omni (or from GitHub Releases), open it, and drag Omni onto Applications. Builds are notarized, so they open without a Gatekeeper prompt.

On first launch Omni downloads the model once (Nano ~1.9 GB or Small ~3.1 GB). That is the only time it touches the network: after that, both indexing and search run on-device with nothing leaving your Mac, so you can pull the plug and run it fully airgapped. Point it at folders to index (Documents, Downloads, Desktop, or any folder you pick), press Index, then search.

Requires an Apple silicon Mac on macOS 14 or later.

Architecture

Sources/OmniKit/   engine + indexer (SPM library)
App/               SwiftUI macOS app (project.yml -> Omni.xcodeproj via XcodeGen)
Tools/             reference fixture generator
Tests/             numeric parity + end-to-end search tests

Embedding

jina-embeddings-v5-omni ported to MLX-Swift: a Qwen3 text tower, a Qwen3-VL vision tower (also used for video frames and scanned-PDF pages), and a Whisper-style audio tower. WeightStore loads the HF safetensors and merges the retrieval LoRA into the backbone; encoders pool the last token and L2-normalize. All modalities land in one shared space, so text finds images and audio finds text.

This is not a stock checkpoint runner. The towers are reworked for throughput - fused norm/activation/rope kernels, fused bias matmuls, shape-aware compile policy, tuned attention I/O precision, cross-file GPU batching with double-buffered readout - while staying parity-gated against the Python reference (cosine >= 0.999, exact token match). Same vectors as the original model, much faster than running it stock.

Indexing

Crawl -> extract -> chunk -> embed -> store, incremental by file mtime and size. A concurrent decode stage (text extraction, image patchify, audio mel) feeds one serialized GPU embed stage; text chunks and images batch across files, audio batches clips under a frame budget. Live updates from the file watcher go through the same batched path as a full pass. MLX calls are serialized through a priority gate and the batch size adapts while you type, so search stays responsive during indexing.

Identical bytes never embed twice: a content hash maps copies, moves, and touched-but-unmodified files (a git checkout, a re-save) to their already-stored vectors. A touch storm that used to re-embed everything now completes in well under a second.

Storing

SQLite is the durable store: file metadata plus bf16 vectors (2 bytes per dimension, negligible recall loss on normalized embeddings). The resident form adapts to the memory budget in Settings > Performance: a full bf16 matrix when it fits, and past that a 4-bit quantized scan replica with the exact bf16 copy kept in a file-backed mapping the OS can page out. Old indexes load unchanged in either mode.

Search

Exact cosine when the index fits the budget: one GPU matmul of the query against the resident matrix (a base prefix plus a small delta of recent rows, scored in one evaluation). At scale, a two-stage funnel: a coarse scan over the quantized replica selects top candidates on the GPU, which are rescored exactly in bf16 before ranking - final scores are exact either way, and recall is gated against the full-precision baseline. Results reduce to the best chunk per file, filtered by kind, folder, extension, and recency. Idle search is a few milliseconds.

Build from source

brew install xcodegen
export OMNI_TEAM_ID=XXXXXXXXXX   # your 10-char Apple Team ID (see below)
xcodegen generate
open Omni.xcodeproj              # then Cmd+R

You need:

Apple silicon Mac, macOS 14+.
Xcode 26 with the Metal Toolchain (xcodebuild -downloadComponent MetalToolchain). MLX-Swift compiles Metal shaders; a plain SwiftPM command-line build cannot, so build through Xcode or xcodebuild.
The model directory (model.safetensors, tokenizer.json, config.json, adapters/retrieval/) from jinaai/jina-embeddings-v5-omni-small-mlx (or the -nano- variant). The app finds it via $OMNI_MODEL_DIR, ~/Library/Application Support/Omni/, or the HuggingFace cache, and otherwise asks you to pick the folder.

Why an Apple Developer account is needed

Omni reads files in your Documents, Downloads, and Desktop, which macOS gates behind TCC permission. The app is code-signed (not ad-hoc) so the system ties that permission to a stable signature and remembers your grant across rebuilds instead of re-prompting every time. Signing requires a Team ID, which is why OMNI_TEAM_ID is set above.

Build and run locally: a free Apple ID is enough. Add it in Xcode (Settings - Accounts), use the personal team it creates, and put that team's ID in OMNI_TEAM_ID.
Distribute a notarized DMG like the Releases here: this needs the paid Apple Developer Program ($99/yr) for a Developer ID Application certificate and Apple's notary service. The release pipeline (.github/workflows/release.yml) uses it; you don't need it just to run Omni yourself.

The repository contains no Apple credentials. The Team ID comes from OMNI_TEAM_ID locally and from the APPLE_TEAM_ID GitHub secret in CI; the signing certificate, notary password, and deploy tokens are all GitHub Actions secrets.

Verify the engine

The MLX-Swift encoder is checked numerically against Python reference fixtures: text must match to cosine >= 0.999 with identical token ids; image, video, and audio towers match the upstream model.py to cosine ~1.0 on identical preprocessed inputs.

uv run python Tools/gen_fixtures.py          # regenerate fixtures (needs mlx + tokenizers)
cp -R <model snapshot> /private/tmp/omni-model
make test                                    # compiles shaders, asserts the cosines

License

Apache 2.0. The model weights are covered by the upstream Jina license (CC-BY-NC-4.0), not this repository.