Skip to content

Graphify

Graphify is an open-source tool and skill designed to convert arbitrary folders—containing code, documentation, papers, images, or videos—into a queryable [[Knowledge Graph]]. It functions as a specialized skill for AI programming assistants, enabling them to understand codebase structures and design decisions with significantly reduced token consumption^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

Core Problem & Solution

Traditional AI programming assistants often face challenges when working with large codebases: they must re-read entire file histories, consuming massive amounts of tokens, and they struggle to connect heterogeneous materials like code, documentation, and media^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

Graphify addresses this by extracting structure, establishing relationships, and persisting a knowledge graph. Subsequent AI queries interact with this compressed graph rather than the raw source files, reportedly reducing query token consumption by up to 71.5x in specific benchmarks^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

Key Features

Feature Description
Multi-modal Input Supports code (25 languages via tree-sitter), PDFs, Markdown, screenshots, whiteboard photos, videos, and audio^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
Local Processing Code parsing via tree-sitter and media transcription via faster-whisper are performed locally to minimize LLM API costs^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
Leiden Clustering Uses graph topology for community discovery (identifying related nodes), independent of vector databases^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
Confidence Labels Tags relationships as EXTRACTED, INFERRED, or AMBIGUOUS to distinguish between factual parsing and AI guesswork^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
Incremental Updates SHA256 caching ensures that only changed files are re-processed^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
MCP Server Can run as a Model Context Protocol server to expose tools like query_graph, get_node, and shortest_path^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
Git Integration Supports post-commit and post-checkout hooks to automatically rebuild the graph^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

Workflow: Three-Pass Scanning

Graphify processes input data through a three-pass pipeline^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md]:

  1. Pass 1 (AST Scan): Uses tree-sitter locally to parse code files, extracting classes, functions, imports, call graphs, docstrings, and design comments without using LLM tokens^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
  2. Pass 2 (Media Transcription): Uses faster-whisper locally to transcribe audio and video. Domain-aware prompts improve transcription accuracy^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].
  3. Pass 3 (Semantic Extraction): Parallel LLM sub-agents (Claude/GPT) process documents, papers, images, and transcribed text to extract concepts, relationships, and design decisions^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

The extracted data is merged into a NetworkX graph, analyzed using Leiden community discovery, and output as HTML, JSON, and reports^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

Usage & Installation

The package is available on PyPI (as graphifyy with two 'y's)^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

Installation

pip install graphifyy
graphify install

Common Commands

# Generate graph for current directory
/graphify .

# Deep inference mode
/graphify ./raw --mode deep

# Incremental update (only changed files)
/graphify ./raw --update

# Export to Obsidian vault
/graphify ./raw --obsidian

# Watch for file changes and auto-rebuild
/graphify ./raw --watch

# Query the graph
graphify query "show the auth flow"
graphify path "DigestAuth" "Response"

Platform Support

Graphify provides "always-on" integration for numerous AI coding platforms^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md].

  • Claude Code: Uses CLAUDE.md and PreToolUse hooks.
  • Cursor: Installs rules to .cursor/rules/graphify.mdc.
  • Aider / Hermes / Codex / Trae: Integrates via AGENTS.md or platform-specific skill files.
  • VS Code Copilot: Installs copilot-instructions.md.

Output Artifacts

The tool generates a graphify-out/ directory containing^[001-TODO__Graphify_-_AI编程助手知识图谱技能.md]: * graph.html: An interactive visualization (browser-based). * GRAPH_REPORT.md: A report containing "God nodes" (central concepts), "Surprising Connections", and suggested questions. * graph.json: The persistent graph data usable by the MCP server or other tools.

  • [[Tree-sitter]]: The underlying parsing engine used for code structure extraction.
  • MCP Server: The protocol standard used to expose Graphify tools to AI agents.
  • [[Knowledge Graph]]: The fundamental data structure Graphify generates.

Sources

  • 001-TODO__Graphify_-_AI编程助手知识图谱技能.md