- TypeScript 98.1%
- CSS 1.5%
- JavaScript 0.3%
- HTML 0.1%
| gitnexus | ||
| gitnexus-mcp | ||
| .gitignore | ||
| package-lock.json | ||
| README.md | ||
GitNexus V2
Zero-Server, Graph-Based Code Intelligence Engine Works fully in-browser through WebAssembly. (DB engine, Embeddings model, AST parsing, all happens inside browser)
https://github.com/user-attachments/assets/2fb7c522-20d1-48f6-9583-36c3969aa4dc
https://gitnexus.vercel.app Being client sided, it costs me zero to deploy, so you can use it for free :-) (would love a ⭐ though)
Like DeepWiki, but deeper. 😉
DeepWiki helps you understand code. GitNexus lets you analyze it—because a knowledge graph tracks every dependency, call chain, and relationship.
That's the difference between:
- "What does this function do?" → understanding
- "What breaks if I change this function?" → analysis
Some quick tech jargon:
- Enhanced Search: BM25 + Semantic + 1-hop graph expansion via Cypher
- Full WASM Stack: Tree-sitter parsing + KuzuDB graph database, all in-browser
- Repo Map: Complete code knowledge graph with CALLS, IMPORTS, EXTENDS relations
- Vector Index: HNSW embeddings for semantic similarity search
- Cypher Queries: Relational analysis for accurate context retrieval
- Grounded AI: Every answer cites
[[file:line]]as proof
What you can do:
| Capability | Description |
|---|---|
| Codebase-wide audits | Find layer violations, forbidden dependencies |
| Blast radius analysis | See every function affected by a change |
| Dead code detection | Identify orphaned nodes with zero incoming calls |
| Dependency tracing | Follow import chains across the entire codebase |
| AI analyses with citations | Ask questions, analyze, get answers with [[file:line]] proof |
100% client-side. Your code never leaves your browser.
Supports: TypeScript, JavaScript, Python (Go, Java, C in progress)
🔍 The Problem with AI Coding Tools
Tools like Cursor, Claude Code, Cline, Roo Code, and Windsurf are powerful—but they share a fundamental limitation: they don't truly know your codebase structure.
| Tool | Context Strategy | The Gap |
|---|---|---|
| Cursor | Files in tabs + embeddings | No call graph. Can't trace "what calls this?" |
| Claude Code | File search + grep | Text-based. Misses semantic connections |
| Cline/Roo Code | Repo map + tree-sitter | Static structure. No runtime dependencies tracked |
| Windsurf | Cascade context | Limited dependency depth |
What happens:
- AI edits
UserService.validate() - Doesn't know 47 functions depend on its return type
- Breaking changes ship 💥
The Solution: Graph Coverage
A knowledge graph tracks actual relationships, not just file contents:
graph LR
EDIT[AI wants to edit UserService.validate] --> QUERY[Graph Query: What depends on this?]
QUERY --> DEPS["47 callers across 12 files"]
DEPS --> SAFE[AI sees full blast radius first]
Current state: GitNexus is a standalone tool—a better DeepWiki that's 100% client-side with graph-powered analysis.
Future goal (MCP): Expose GitNexus as an MCP server so tools like Cursor and Claude Code can query it for accurate context. They ask "what calls X?", GitNexus returns the actual call graph. No more guessing.
🚀 Quick Start
git clone <repository-url>
cd gitnexus
npm install
npm run dev
Open http://localhost:5173, drag & drop a ZIP of your codebase, and start exploring.
🏗️ Indexing Architecture
Two-phase indexing: Knowledge Graph (blocking) → Embeddings (background).
Phase 1-5: Knowledge Graph Creation
flowchart TD
subgraph P1["Phase 1: Extract (0-15%)"]
E1[Decompress ZIP] --> E2[Collect file paths]
end
subgraph P2["Phase 2: Structure (15-30%)"]
S1[Build folder tree] --> S2[Create CONTAINS edges]
end
subgraph P3["Phase 3: Parse (30-70%)"]
PA1[Load Tree-sitter WASM] --> PA2[Generate ASTs]
PA2 --> PA3[Extract symbols]
PA3 --> PA4[Populate Symbol Table]
end
subgraph P4["Phase 4: Imports (70-82%)"]
I1[Find import statements] --> I2[Resolve paths]
I2 --> I3[Create IMPORTS edges]
end
subgraph P5["Phase 5: Calls + Heritage (82-100%)"]
C1[Find function calls] --> C2[Resolve via Symbol Table]
C2 --> C3[Create CALLS edges]
C3 --> H1[Find extends/implements]
H1 --> H2[Create EXTENDS/IMPLEMENTS edges]
end
P1 --> P2 --> P3 --> P4 --> P5
P5 --> DB[(KuzuDB WASM)]
DB --> READY[Graph Ready!]
Symbol Table: Dual HashMap
Resolution strategy for function calls:
flowchart TD
CALL[Found call: validateUser] --> CHECK1{In Import Map?}
CHECK1 -->|Yes| FOUND1[Use imported definition]
CHECK1 -->|No| CHECK2{In Current File?}
CHECK2 -->|Yes| FOUND2[Use local definition]
CHECK2 -->|No| CHECK3{Global Search}
CHECK3 -->|Found| FOUND3[Use first match]
CHECK3 -->|Not Found| SKIP[Skip - unresolved]
FOUND1 --> EDGE[Create CALLS edge]
FOUND2 --> EDGE
FOUND3 --> EDGE
Data structure:
File-Scoped: Map<FilePath, Map<SymbolName, NodeID>>
Global: Map<SymbolName, SymbolDefinition[]>
Phase 6+: Background Embeddings
flowchart LR
subgraph BG["Background (Non-blocking)"]
M1[Load snowflake-arctic-embed-xs] --> M2[Initialize WebGPU/WASM]
M2 --> E1[Batch embed nodes]
E1 --> E2[INSERT into CodeEmbedding table]
E2 --> V1[Create HNSW Vector Index]
V1 --> B1[Build BM25 Index]
end
BG --> AI[AI Search Ready!]
User can explore the graph during embedding. AI features unlock when complete.
📊 Graph Schema
Node Types
| Label | Description | Properties |
|---|---|---|
Folder |
Directory | name, filePath |
File |
Source file | name, filePath, language |
Function |
Function def | name, filePath, startLine, endLine, isExported |
Class |
Class def | name, filePath, startLine, endLine |
Interface |
Interface def | name, filePath, startLine, endLine |
Method |
Class method | name, filePath, startLine, endLine |
CodeElement |
Generic symbol | name, filePath |
Relationship Table: CodeRelation
Single edge table with type property:
| Type | From | To | Description |
|---|---|---|---|
CONTAINS |
Folder | File/Folder | Directory structure |
DEFINES |
File | Function/Class/etc | Code definitions |
IMPORTS |
File | File | Module dependencies |
CALLS |
Function/Method | Function/Method | Call graph |
EXTENDS |
Class | Class | Inheritance |
IMPLEMENTS |
Class | Interface | Interface implementation |
🛠️ Agent Tools Architecture
The LangChain ReAct agent has 5 tools for code exploration. These tools use the graph built during indexing.
Tool 1: search — Hybrid Search with Graph Context
Combines BM25 (keyword) + Semantic (vector) + 1-hop expansion:
flowchart TD
Q[Query: auth middleware] --> BM25[BM25 Keyword Search]
Q --> SEM[Semantic Vector Search]
BM25 --> RRF[Reciprocal Rank Fusion]
SEM --> RRF
RRF --> TOP[Top K Results]
TOP --> HOP[1-Hop Graph Expansion]
HOP --> OUT["Each result includes:
• ID, file, score
• Incoming connections (who calls this)
• Outgoing connections (what this calls)"]
How 1-hop works:
MATCH (n {id: $nodeId})
OPTIONAL MATCH (n)-[r1:CodeRelation]->(dst)
OPTIONAL MATCH (src)-[r2:CodeRelation]->(n)
RETURN collect(dst.name), collect(src.name)
The agent sees not just what matches, but what connects to it.
Tool 2: cypher — Raw Graph Queries with Auto-Embedding
Execute Cypher directly. If you include {{QUERY_VECTOR}}, it auto-embeds:
flowchart LR
CQ[Cypher with placeholder] --> CHECK{Contains QUERY_VECTOR?}
CHECK -->|Yes| EMBED[Embed query text]
EMBED --> REPLACE[Replace placeholder with vector]
CHECK -->|No| EXEC
REPLACE --> EXEC[Execute Cypher]
EXEC --> RES[Return Results]
Example with auto-embedding:
CALL QUERY_VECTOR_INDEX('CodeEmbedding', 'idx', {{QUERY_VECTOR}}, 10)
YIELD node, distance
WHERE distance < 0.4
MATCH (caller:Function)-[:CodeRelation {type: 'CALLS'}]->(n:Function {id: node.nodeId})
RETURN caller.name, n.name
The agent provides query: "authentication" → system embeds it → injects the vector.
Tool 3: grep — Regex Pattern Matching
For exact strings, error codes, TODOs:
flowchart LR
PAT["Pattern: TODO|FIXME"] --> REGEX[Compile Regex]
REGEX --> SCAN[Scan all files]
SCAN --> MATCH[Match per line]
MATCH --> RES["file:line: content"]
Tool 4: read — Smart File Reader
Fuzzy path matching with suggestions:
flowchart TD
REQ[Request: src/utils.ts] --> EXACT{Exact match?}
EXACT -->|Yes| RET[Return content]
EXACT -->|No| FUZZY[Fuzzy match by segments]
FUZZY --> FOUND{Found?}
FOUND -->|Yes| RET
FOUND -->|No| SUGGEST[Suggest similar files]
Tool 5: highlight — Visual Graph Feedback
Emits a marker that the UI parses to highlight nodes:
[HIGHLIGHT_NODES:Function:src/auth.ts:validate,Class:src/user.ts:UserService]
💡 Key Discovery: Unified Vector + Graph
Most Graph RAG systems use separate databases—vector DB for semantic search, graph DB for traversal.
KuzuDB supports native vector indexing (HNSW), so we do both in one Cypher query:
-- Semantic search + graph traversal in ONE query
CALL QUERY_VECTOR_INDEX('CodeEmbedding', 'code_embedding_idx', $queryVector, 20)
YIELD node AS emb, distance
WITH emb, distance WHERE distance < 0.4
MATCH (n:Function {id: emb.nodeId})<-[:CodeRelation {type: 'CALLS'}]-(caller:Function)
RETURN n.name, caller.name, distance
ORDER BY distance
Why this matters:
- 🎯 Single query execution — No round-trips between systems
- 📊 Built-in relevance ranking — Distance IS the score
- ⚡ No separate vector DB — One database, one query language
- 🌳 LLM-friendly — Agent writes one Cypher, gets semantic + structural results
🔬 Deep Dive: Copy-on-Write Memory Issue
Hit an interesting problem storing embeddings worth documenting.
Setup: Store 384-dim embeddings alongside code nodes.
MATCH (n:CodeNode {id: $id}) SET n.embedding = $vec
Problem: Worked for ~20 nodes, exploded at ~1000:
Buffer manager exception: Unable to allocate memory!
Root cause: Copy-on-Write. Each UPDATE copies the entire record (~2KB of code content). 1000 updates = massive memory duplication in WASM.
flowchart LR
subgraph COW["Copy-on-Write Effect"]
OLD[Old: 2KB] --> NEW[New: 3.5KB]
end
COW -->|"× 1000 nodes"| BOOM[💥 Buffer Exhausted]
Fix: Separate CodeEmbedding table with INSERT only:
flowchart TD
subgraph Old["❌ Single Table"]
CN1[CodeNode with embedding<br/>UPDATE triggers COW]
end
subgraph New["✅ Separate Table"]
CN2[CodeNode<br/>id, name, content]
CE[CodeEmbedding<br/>nodeId, embedding<br/>INSERT only]
end
Old -->|"Memory explosion"| FAIL
New -->|"Works at scale"| WIN
Lesson: In-memory WASM DBs have hard limits. Profile at scale, not happy path.
⚡ V2 Technical Improvements
Sigma.js + WebGL
- V1: D3.js, choked at ~3k nodes
- V2: Sigma.js + GPU rendering, smooth at 10k+
Dual HashMap Symbol Table
- V1: Trie (prefix tree) - clever but slow
- V2: File-scoped + Global hashmaps - ~2x speedup
LRU AST Cache
- Tree-sitter ASTs live in WASM memory
- LRU cache (50 slots) with
tree.delete()for cleanup - Memory stays bounded even for huge codebases
ForceAtlas2 in Web Worker
- Layout algorithm runs off main thread
- UI stays responsive during graph positioning
🚧 Roadmap
Actively Building
- MCP Support - Model Context Protocol for tool extensibility
- External DB Support - Connect to Neo4j (hosted or Docker)
- Blast Radius Analysis Tool - Dedicated UI for impact analysis
- Multi-Worker Pool - Parallel parsing across Web Workers
- Ollama Support - Local LLM integration
- CSV Export - Export node/relationship tables
🎯 The Vision: Browser-Based MCP Server
Goal: Expose GitNexus as a local MCP server directly from the browser.
This would let AI coding tools like Cursor, Claude Code, Windsurf, etc. connect to your running GitNexus instance and use its knowledge graph for:
- 🔍 Reliable context gathering — AI gets actual dependencies, not grep guesses
- 💥 Blast radius detection — Before making changes, query what would break
- 🔐 Codebase-wide audits — Find violations, dead code, circular dependencies
- 🧠 Grounded answers — Every response backed by graph traversal, not hallucination
graph LR
subgraph Browser["GitNexus (Browser)"]
KG[Knowledge Graph]
MCP[MCP Server]
end
subgraph Tools["AI Coding Tools"]
CURSOR[Cursor]
CLAUDE[Claude Code]
WIND[Windsurf]
end
KG --> MCP
MCP <-->|localhost| CURSOR
MCP <-->|localhost| CLAUDE
MCP <-->|localhost| WIND
Why this matters: Current AI coding tools are blind to real dependencies. They use grep or embeddings—better than nothing, but not enough to prevent breaking changes. A knowledge graph MCP would give them the accurate, structural context they need.
Recently Completed ✅
- Graph RAG Agent with 5 tools (search, cypher, grep, read, highlight)
- Browser embeddings (snowflake-arctic-embed-xs, 22M params)
- Vector index with HNSW in KuzuDB
- Hybrid search (BM25 + semantic + RRF)
- Streaming AI chat with tool visibility
- Grounded citations (
[[file:line]]format) - Multiple LLM providers (OpenAI, Azure, Gemini, Anthropic)
🛠 Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, Tailwind v4 |
| Visualization | Sigma.js, Graphology, ForceAtlas2 (WebGL) |
| Parsing | Tree-sitter WASM (TS, JS, Python) |
| Database | KuzuDB WASM (graph + vector HNSW) |
| Embeddings | transformers.js, snowflake-arctic-embed-xs (22M) |
| AI | LangChain ReAct agent, streaming |
| Concurrency | Web Workers + Comlink |
🔐 Security & Privacy
- All processing happens in your browser
- No code uploaded to any server
- API keys stored in localStorage only
- Open source—audit the code yourself
📝 License
MIT License
🙏 Acknowledgments
- Tree-sitter - AST parsing
- KuzuDB - Embedded graph database with vector support
- Sigma.js - WebGL graph rendering
- transformers.js - Browser ML
- LangChain - Agent orchestration