Indexing
markymark maintains two index levels: a per-document index storing every symbol in a single file, and a per-realm index enabling cross-document lookups.
DocumentIndex — per-document symbols
Section titled “DocumentIndex — per-document symbols”DocumentIndex (in markymark-index/src/document/) stores all extracted symbols
for one document using arena allocation (bumpalo) for minimal allocation overhead.
It can be built through three construction paths:
| Path | Source | Used by |
|---|---|---|
from_blob() | Zig Document Engine binary blob | LSP (real-time) |
from_scan() | ScanBackend trait calls | Standalone scanning |
from_ast() | Tree-sitter frontmatter + Zig scan backend | MCP batch indexing |
All three produce the same DocumentDependent structure — typed entry slices for
headings, wiki links, markdown links, tags, XML tags, code spans, tasks, embeds,
frontmatter, properties, and more.
DocumentIndex uses self_cell so arena-allocated references remain valid for
the lifetime of the index without unsafe lifetime gymnastics.
RealmIndex — cross-document lookups
Section titled “RealmIndex — cross-document lookups”RealmIndex (in markymark-index/src/realm/) aggregates document indexes for a
workspace. When a document is added or updated, it populates lookup tables:
| Table | Key | Purpose |
|---|---|---|
slug_to_headings | Heading slug | Find documents containing a heading |
block_to_location | Block ID | Resolve ^block-id references |
tag_to_docs | Tag name | Find documents with a tag (lazy) |
code_span_to_docs | Code span text | Cross-references to code symbols |
stem_to_uris | File stem | Resolve wiki links by page name |
key_path_to_docs | Key path | Structured document key lookup |
date_to_docs | Journal date | Date-based document lookup (BTreeMap for range queries) |
String interning
Section titled “String interning”Cross-document HashMap keys use string interning via lasso::Rodeo. A slug
like getting-started appearing in 10 documents is stored once and referenced
by a compact Spur token. The interner grows monotonically — it never
deallocates strings during the LSP session lifetime. For a 10K-document vault
with ~500K unique slugs, tags, and block IDs, the interner holds roughly 10 MB.
File stems used in stem_to_uris are lowercased before interning, enabling
case-insensitive wiki link resolution via O(1) lookup.
Lazy tag maintenance
Section titled “Lazy tag maintenance”The tag_to_docs index uses a lazy rebuild strategy to avoid patching overhead
during rapid edits. When update_document() detects a tag change, it sets a
tags_dirty flag instead of modifying the tag index immediately. The full tag
index is rebuilt from per-document contribution metadata the next time a
mutation needs it (ensure_tags_clean()). Read-only queries like tag_counts()
compute tag data directly from contributions when dirty, avoiding mutation
entirely.
Cross-document resolution
Section titled “Cross-document resolution”The resolution module (markymark-index/src/resolution.rs) resolves link targets:
- Wiki links —
resolve_wiki_link()finds documents by page name (stem matching), optionally resolving a heading anchor within - Markdown links —
resolve_markdown_link()tries path-relative resolution first, falling back to stem-only lookup - Block references —
resolve_block_ref()looks up^idacross all documents
Path-relative resolution uses component-stack normalization rather than filesystem
canonicalize(), so it works without the target file existing on disk.
Diagnostics
Section titled “Diagnostics”compute_diagnostics() in markymark-index/src/diagnostics.rs checks a document
against its realm and reports broken wiki links, broken markdown links, broken
heading anchors, and duplicate heading slugs. This function is shared between LSP
and MCP — both call the same code.
Index updates
Section titled “Index updates”Index updates are event-driven through the LSP protocol, not filesystem watching:
textDocument/didOpen— document parsed and added to the realmtextDocument/didChange— Document Engine re-parses, index incrementally updatedtextDocument/didClose— document removed from the realm
A 75ms debounce coalesces rapid keystrokes into a single reparse cycle. For MCP,
documents are indexed in batch when a workspace root is added via add-root.
Incremental cross-document updates
Section titled “Incremental cross-document updates”When a document changes, update_document() diffs the old and new
DocContribution — a per-document snapshot of which interned keys (heading
slugs, block IDs, tags, code spans, file stem) that document contributed to
cross-document indexes.
- Fast path: If the contribution sets are identical (common for edits that
don’t change document structure), all cross-document index operations are
skipped — only the stored
DocumentIndexis swapped. - Slow path: When contributions differ, only the changed entries are
patched. Added slugs are inserted into
slug_to_headings, removed slugs are cleaned out, and so on for blocks, code spans, and stems. Tags use the lazy strategy described above.