Graph-RAG

Clawdius uses a Graph-RAG (Retrieval-Augmented Generation) system to provide the LLM with deep codebase context.

Overview

Graph-RAG combines two complementary approaches:

Structural understanding via AST (Abstract Syntax Tree) parsing with tree-sitter
Semantic search via vector embeddings stored in LanceDB

This gives the LLM both precise code structure knowledge and fuzzy semantic matching capability.

tree-sitter parses source files into ASTs, which are stored in a SQLite database at .clawdius/graph/index.db.

Supported languages:

The AST index enables structural queries like "find all function definitions" or "list imports in this module."

Code is embedded into vector representations and stored in LanceDB at .clawdius/graph/vectors.lance.

Semantic search allows queries like "find error handling patterns" or "locate database connection code" without exact keyword matching.

The petgraph crate powers the code relationship graph, tracking:

[storage]
database_path = ".clawdius/graph/index.db"
vector_path = ".clawdius/graph/vectors.lance"

Graph-RAG has two levels of activation:

Feature	Description
`vector-db`	Enable LanceDB vector search
`embeddings`	Enable local ML embeddings (tokenizers + HuggingFace)
`local-llm`	Full local inference (Candle + embeddings + vector-db)

Index your workspace:

# Requires vector-db feature
clawdius index .
clawdius index . --watch    # Watch for changes and re-index

Query the indexed workspace:

# Requires vector-db feature
clawdius context "how does authentication work"
clawdius context "error handling patterns" --max-tokens 8000

Graph-RAG queries are optimized for speed:

When you ask Clawdius a question about your codebase:

This produces more accurate, grounded responses compared to sending raw file contents.