A Phase-Gated Workflow for Persistent Repository Mapping Across AI Sessions
Introduction
AI agents often struggle when entering an unfamiliar repository. The failure mode is not only missing context, but premature certainty: directory names are over-trusted, partial file reads are mistaken for system boundaries, and first-pass hypotheses silently harden into architectural claims. This makes repository onboarding, cross-session continuity, and architecture-aware modification unreliable.
We present nexus-mapper, an executable workflow for building a persistent .nexus-map/ knowledge base from a local repository. Instead of producing a one-shot free-form summary, the workflow follows a phase-gated PROBE protocol and emits a bounded set of reusable artifacts for later sessions. The implementation is publicly available at https://github.com/haai-coding/Nexus-skills.
Related Work
Several tools address repository-level code understanding for AI agents, but differ in persistence, provenance, and execution model.
Aider's Repo Map. Aider constructs a tree-sitter-based repository map containing file listings, symbol definitions, and call signatures, optimized for token budgets via graph ranking [1]. The map is regenerated per-session and sent alongside user prompts. Unlike nexus-mapper, Aider's repo map is ephemeral: it does not persist architectural context across sessions, does not include git forensics, and does not distinguish implemented from inferred structure. However, Aider's graph-ranking approach to symbol selection is a useful reference for future optimization of nexus-mapper's cold-start index.
Bloop. Bloop combines semantic code search (via embedded MiniLM vectors in Qdrant), regex-based text search (via Tantivy), and tree-sitter navigation for multi-language codebases [2]. It provides conversational code exploration powered by GPT-4. Bloop's focus is interactive code search rather than persistent architectural mapping; it does not emit bounded artifacts for cross-session reuse and does not include provenance labeling or degraded-mode reporting.
Meta-RAG. Vali Tawosia et al. [3] propose a multi-agent RAG framework that condenses codebases by an average of 79.8% into structured natural language summaries for bug localization. Evaluated on SWE-bench Lite, Meta-RAG achieves 84.67% file-level localization accuracy. While Meta-RAG demonstrates the value of structured code summarization, its summaries are generated for a single task (bug localization) and do not address cross-session persistence or provenance tracking. nexus-mapper complements retrieval-based approaches by producing durable artifacts that any downstream retrieval or reasoning system can consume.
Sourcegraph Cody. Cody uses embeddings of code snippets for semantic search across repositories, combined with LLM-based question answering [4]. Like Bloop, Cody targets interactive code exploration rather than persistent architectural artifacts.
Code Summarization and RAG. Recent empirical studies on RAG for code completion in industrial codebases [5] and surveys on repository-level code generation [6] confirm that structured retrieval of code context improves downstream task performance. However, these works focus on retrieval mechanisms rather than the quality and persistence of the contextual artifacts themselves. nexus-mapper occupies a complementary position: it produces the architectural artifacts that retrieval systems can index and that agents can load across sessions.
Positioning. nexus-mapper differs from these tools in three ways: (1) it produces persistent, bounded artifacts rather than ephemeral per-session context; (2) it includes explicit provenance and uncertainty labeling rather than presenting all extracted information as equally verified; (3) it reports degraded execution conditions rather than silently proceeding with partial coverage.
Problem Setting
Repository understanding for AI agents has two practical constraints.
First, evidence is incomplete at cold start. The agent may only see a file tree, a README, or a few entrypoints. Second, future sessions rarely inherit the full reasoning state of earlier sessions. As a result, repository understanding must be not only generated, but also persisted in a form that later sessions can reload safely.
The goal of nexus-mapper is therefore not generic summarization. Its goal is to produce durable, evidence-backed architectural context for future work.
Design Decisions
Why a phase-gated workflow instead of a single repository summary? The main failure mode at cold start is not lack of text generation capability, but premature closure. A staged workflow creates explicit checkpoints between evidence collection, hypothesis formation, challenge, and artifact emission.
Why persist artifacts instead of relying on session memory? Cross-session work is the normal case in practical engineering. If repository understanding is not written into bounded artifacts, the next session must reconstruct architecture from scratch and may drift from earlier conclusions.
Why include provenance and degraded-mode reporting? Repository understanding is never uniformly complete across languages, histories, and parser support. The workflow therefore treats uncertainty as first-class output rather than allowing unsupported or inferred regions to masquerade as verified structure.
The PROBE Protocol
The PROBE protocol defines four sequential phases with explicit gate conditions between each phase transition. A phase cannot begin until its preceding gate is satisfied.
Phase 1: PERCEIVE (Evidence Collection)
Objective: Collect raw structural evidence from the repository without forming architectural hypotheses.
Activities:
- Execute multi-language AST extraction via
extract_ast.py, producingraw/ast_nodes.jsonandraw/file_tree.txt - When git history is available, execute
git_detective.pyfor hotspot and co-change analysis, producingraw/git_stats.json - Record parser availability, language coverage, and any truncation or degradation flags
- Filter generated directories, third-party assets, and
.gitignore-excluded noise
Output: Raw evidence artifacts in .nexus-map/raw/ with explicit metadata about extraction coverage and limitations.
Gate condition G1-2: Raw evidence directory exists and contains at least ast_nodes.json and file_tree.txt. Parser availability and language coverage have been recorded. Any degradation (truncation, missing parsers, absent git history) has been flagged in metadata.
Phase 2: RELATE (Hypothesis Formation)
Objective: Form architectural hypotheses from raw evidence, identifying subsystem boundaries, dependency patterns, and domain vocabulary.
Activities:
- Analyze AST node distributions to identify module clusters and ownership patterns
- Extract import/dependency relationships from AST data
- Identify domain-specific vocabulary from function names, class names, and file paths
- Form candidate subsystem boundaries based on directory structure and coupling evidence
- Generate dependency summaries and structural relationship maps
Output: Candidate architectural artifacts in .nexus-map/arch/ and .nexus-map/concepts/, each marked as inferred or implemented based on evidence strength.
Gate condition G2-3: Architectural hypotheses exist in structured form. Each hypothesis is labeled with its evidence source (AST-derived, git-derived, or inferred from naming patterns). No unlabeled hypotheses remain.
Phase 3: BOUND (Challenge and Limit)
Objective: Challenge architectural hypotheses, identify evidence gaps, and establish explicit boundaries for what the map claims to cover.
Activities:
- Cross-reference subsystem boundaries against dependency evidence
- Identify regions where AST coverage is partial or absent
- Verify that
inferredelements are not masquerading asimplemented - Generate test-surface notes documenting which areas have structural test coverage evidence
- Record explicit limitations: unsupported languages, missing git history, truncated nodes
Output: Refined architectural artifacts with explicit provenance headers distinguishing implemented, planned, and inferred elements. Degraded execution conditions documented in each artifact header.
Gate condition G3-4: All artifacts have provenance headers. No artifact presents inferred content as verified. Degraded conditions are documented. Evidence gaps are explicitly labeled rather than silently filled.
Phase 4: EMIT (Artifact Generation)
Objective: Generate the final .nexus-map/ knowledge base with consistent formatting and cross-references.
Activities:
- Generate
INDEX.mdcold-start routing summary from refined architectural artifacts - Generate
concepts/concept_model.jsonmachine-readable concept graph - Ensure cross-references between artifacts are consistent
- Apply consistent formatting and provenance headers across all outputs
- Verify all raw evidence artifacts are present and referenced
Output: Complete .nexus-map/ directory ready for cross-session reuse.
Completion criterion: All artifacts in .nexus-map/ have provenance headers, cross-references are consistent, and the INDEX.md provides accurate routing to all sub-artifacts.
Gate Condition Summary
| Transition | Gate | Condition |
|---|---|---|
| G1-2 | Evidence sufficiency | Raw artifacts exist; coverage and degradation recorded |
| G2-3 | Hypothesis labeling | All hypotheses tagged with evidence source |
| G3-4 | Provenance verification | No unlabeled inference; gaps explicitly documented |
Evidence
The workflow has been executed on four heterogeneous repositories spanning different domains, language mixes, and git maturity. The following table reports execution outcomes.
| Repository | Domain | Languages | Files | Parsed | Errors | Degradation | Git window | Authors | Hotspot |
|---|---|---|---|---|---|---|---|---|---|
| Workspace (OpenClaw) | AI Agent workspace | Python, JS, MD, JSON, Shell | 452 (15 .py) | 15/15 | 0 | 1 commit only | 90d / 1 commit | 1 | N/A (single commit) |
| Dynasty | Game engine | Python, GDScript | 60 | 60/60 | 0 | None | 90d / 77 commits | 1 | Full AST coverage |
| httpie/cli | HTTP client CLI | Python, JSON, YAML, Shell | ~180 (133 .py) | 133/133 | 0 | 27 file types, .fish/.ps1 unparsered | All-time / 1826 commits | 5 | setup.py (123x), cli.py (118x), core.py (110x) |
| ToVox | Voxel engine | Rust, TS, Python, C, C++, C#, JS, Bash | 145 | 145/145 | 0 | truncated=true (896 nodes), Bash module-only | 90d / 38 commits | 1 | Full AST + git co-change |
Key observations:
Parse reliability. Across all four repositories, AST parsing achieved 100% success rate (353/353 Python files, 0 errors). This establishes that the extraction phase is reliable for Python-centric repositories.
Language coverage varies. The workspace contains 271 JavaScript files (mostly node_modules) and 91 Markdown files. httpie/cli includes .fish (Fish shell) and .ps1 (PowerShell) scripts that require language-specific parsers not yet implemented. The workflow correctly flags these as unparsered rather than silently ignoring them.
Git maturity matters. The workspace has only 1 commit, making hotspot analysis impossible. httpie/cli has 1826 commits across 5 authors over 14 years, providing rich co-change signals. The workflow adapts: when git data is sparse, it reports N/A rather than fabricating statistics.
Degradation reporting works. ToVox surfaced truncation (896 nodes) and Bash module-only coverage. The workspace flagged insufficient git history. httpie/cli identified unparsered shell variants. In each case, the limitation was reported in the artifact metadata.
Subsystem detection. httpie/cli's AST analysis correctly identified 6 subsystem clusters (httpie/cli, httpie/plugins, httpie/output, httpie/output/ui, tests, tests/utils) matching the known project structure.
We deliberately treat this as execution evidence rather than a benchmark. The claim is that the workflow runs across heterogeneous codebases, emits inspectable artifacts with explicit provenance, and preserves its own boundary conditions. A comparative study against unconstrained repository summarization on downstream coding tasks remains future work.
Why This Is Different
The contribution of nexus-mapper is not simply repository understanding. Many tools can produce ad-hoc summaries. The distinctive contribution here is a reproducible protocol that turns repository understanding into durable, inspectable artifacts with explicit provenance, bounded scope, and support for degraded execution.
In this sense, nexus-mapper is better understood as a repository mapping workflow than as a summarization prompt.
Limitations
- The workflow requires local shell execution and Python support.
- Structural coverage differs across languages; unsupported or parser-unavailable regions must be marked explicitly.
- Git hotspot analysis is unavailable when repository history is absent or sparse.
- The current evidence demonstrates executability and artifact production across four repositories, not a quantified downstream productivity gain.
- The current evaluation is repository-level and descriptive; a comparative study against tools like Aider's repo-map or Meta-RAG on a shared benchmark (e.g., SWE-bench) remains future work.
Conclusion
nexus-mapper provides an executable workflow for persistent repository mapping across AI sessions. By combining phase-gated reasoning (the PROBE protocol), structural extraction, optional git forensics, and explicit provenance, it produces reusable architectural context that later sessions can load directly. Evidence from four heterogeneous repositories confirms that the workflow executes reliably (353/353 Python files parsed, 0 errors), adapts to varying git maturity, and explicitly reports degradation rather than hiding it. The central contribution is practical and methodological: replacing fragile first impressions with durable, evidence-backed repository artifacts.
References
[1] Aider. "Repo Map." https://aider.chat/2023/10/22/repomap.html
[2] Bloop. https://github.com/bloopai/bloop
[3] Vali Tawosia et al. "Meta-RAG on Large Codebases Using Code Summarization." arXiv:2508.02611, 2025. Presented at AGENT 2026 (ICSE 2026 workshop).
[4] Sourcegraph Cody. https://sourcegraph.com/cody
[5] RAG for Code Completion in Industrial Codebases. arXiv:2507.18515, 2025.
[6] Repository-Level Code Generation and Retrieval-Augmented Code Generation Survey. 2024.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: nexus-mapper description: > Executable workflow for building a persistent .nexus-map/ knowledge base from a local code repository. Uses a phase-gated PROBE protocol with multi-language AST extraction, optional git hotspot analysis, provenance labeling, and structured artifact emission. allowed-tools: Bash(git *), Bash(python *), Bash(pip *) --- # Nexus-Mapper Workflow Builds a persistent repository knowledge base for AI cold-start recovery and architecture-aware development. ## Get the repository ```bash git clone https://github.com/haai-coding/Nexus-skills.git cd Nexus-skills ``` ## Prerequisites ```bash pip install -r skills/nexus-mapper/scripts/requirements.txt ``` ## Run ```bash python skills/nexus-mapper/scripts/extract_ast.py <repo_path> --file-tree-out <repo_path>/.nexus-map/raw/file_tree.txt > <repo_path>/.nexus-map/raw/ast_nodes.json ``` ```bash python skills/nexus-mapper/scripts/git_detective.py <repo_path> --days 90 > <repo_path>/.nexus-map/raw/git_stats.json ``` Then execute the PROBE protocol (PERCEIVE - RELATE - BOUND - EMIT) defined in skills/nexus-mapper/SKILL.md to generate the final .nexus-map/ knowledge base with provenance-marked artifacts. ## Output - .nexus-map/INDEX.md: compact cold-start routing summary - .nexus-map/arch/: systems, dependencies, and test-surface summaries - .nexus-map/concepts/: domain glossary and machine-readable concept graph - .nexus-map/hotspots/: git hotspot and coupling analysis when history exists - .nexus-map/raw/: AST nodes, git statistics, and filtered file tree
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.