From f6070ba53ee19055922b6fcdaf45d078b91a662c Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 26 Dec 2025 13:57:27 +0000
Subject: [PATCH] Add comprehensive README and ROADMAP based on codebase
 analysis

Expanded README.adoc to document the substantial work already present:
- Philosophy and ISA scoring system (0-100, lower is better)
- All 10 metric dimensions (TII, LPS, EFR, PQ, TAI, ICS, CII, SRS, SFR, RCI)
- Measurement methodology (patterns, probes, human evaluation)
- Full architecture overview with all 13 Ada source files
- API providers (local-first: Ollama, LMStudio, llama.cpp, etc.)
- Report formats (JSON, HTML, Markdown, CSV, LaTeX, YAML)
- GUI design mockup
- Satellite architecture for intervention tools
- LMSYS Arena integration proposal with preliminary scores

Added ROADMAP.adoc with phased development plan:
- Phase 1: Core implementation (patterns, probes, extended metrics)
- Phase 2: API & automation (clients, batch evaluation, reports)
- Phase 3: GUI implementation (GtkAda, radar charts, comparison)
- Phase 4: Satellite ecosystem (traces, vex-lazy-eliminator)
- Phase 5: Community & validation (LMSYS, academic publication)
- Version targets from v0.2.0 to v1.0.0
---
 README.adoc  | 348 ++++++++++++++++++++++++++++++++++++++++++++--
 ROADMAP.adoc | 379 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 716 insertions(+), 11 deletions(-)
 create mode 100644 ROADMAP.adoc

diff --git a/README.adoc b/README.adoc
index f17e468..270927d 100644
--- a/README.adoc
+++ b/README.adoc
@@ -1,37 +1,363 @@
-// SPDX-FileCopyrightText: 2024 Jonathan D.A. Jewell
+// SPDX-FileCopyrightText: 2024-2025 Jonathan D.A. Jewell
 // SPDX-License-Identifier: AGPL-3.0-or-later
+
 = Vexometer: Irritation Surface Analyser
 Jonathan D.A. Jewell <jonathan@jewell.dev>
 v0.1.0
 :toc: left
+:toclevels: 3
 :icons: font
+:source-highlighter: rouge
+
+A rigorous, reproducible tool for quantifying the irritation surface of AI assistants, producing standardised metrics that complement existing benchmarks (MMLU, HumanEval, MT-Bench) with human experience dimensions.
+
+== Philosophy
+
+[quote]
+____
+Current benchmarks measure capability—what models CAN do.
+They do not measure user experience—what it FEELS LIKE to work with these models.
+____
+
+The AI assistant market is maturing. Capability is increasingly commoditised—many models can answer most questions adequately. Differentiation will come from user experience.
+
+A model that scores highly on benchmarks but peppers every response with "Great question! I'd be happy to help!" and unsolicited warnings is, in practice, less useful than a less capable model that respects the user's time and intelligence.
 
-A rigorous tool for quantifying AI assistant irritation surfaces.
+*Vexometer measures what users actually care about.*
 
 == Overview
 
-Current benchmarks measure capability. Vexometer measures *user experience*.
+Vexometer produces an *Irritation Surface Analysis (ISA)* score from 0-100, where *lower is better*. The score aggregates ten measurable dimensions of user experience degradation.
+
+[cols="1,3,2", options="header"]
+|===
+|Score Range |Classification |Interpretation
+
+|< 20 |Excellent |Model respects user time and intelligence
+|20-35 |Good |Minor irritation patterns present
+|35-50 |Acceptable |Noticeable but tolerable issues
+|50-70 |Poor |Significant user experience problems
+|> 70 |Unusable |Severe irritation surface
+|===
+
+== Core Metrics (10 Dimensions)
+
+=== Original Metrics (v1)
+
+[cols="1,2,4", options="header"]
+|===
+|Abbrev |Full Name |What It Measures
+
+|*TII*
+|Temporal Intrusion Index
+|Unsolicited outputs, latency disruption, flow interruption, auto-completion aggression
+
+|*LPS*
+|Linguistic Pathology Score
+|Sycophancy density, hedge word ratio, corporate speak, unnecessary repetition, emoji abuse
+
+|*EFR*
+|Epistemic Failure Rate
+|Confident hallucination, fabricated references, context ignorance, calibration error
+
+|*PQ*
+|Paternalism Quotient
+|Unsolicited warnings, over-explanation, competence assumption failures, refusal-with-lecture
+
+|*TAI*
+|Telemetry Anxiety Index
+|Data collection transparency, opt-out friction, code/query transmission clarity
+
+|*ICS*
+|Interaction Coherence Score
+|Repeated failures, learning from dismissal, circular conversations, context retention
+|===
+
+=== Extended Metrics (v2)
+
+[cols="1,2,4", options="header"]
+|===
+|Abbrev |Full Name |What It Measures
+
+|*CII*
+|Completion Integrity Index
+|TODO comments, placeholders, unimplemented stubs, truncation markers, null implementations
+
+|*SRS*
+|Strategic Rigidity Score
+|Patch-on-patch fixes, restart resistance, sunk-cost language, approach anchoring
+
+|*SFR*
+|Scope Fidelity Ratio
+|Scope creep, scope collapse, partial delivery, explicit violations
+
+|*RCI*
+|Recovery Competence Index
+|Identical retries, minor variations, strategy changes, root cause analysis, escalation
+|===
+
+== Measurement Methodology
+
+=== 1. Automated Pattern Detection
+
+Regex-based identification of known irritation patterns. Over 50 patterns catalogued across categories.
+
+.Example patterns detected:
+[source]
+----
+LPS: "Great question!", "I'd be happy to help", "As an AI..."
+PQ:  "I must caution you", "Before we proceed", "Let me explain"
+CII: "TODO", "...", "unimplemented!()", "// rest similar"
+----
+
+See `data/patterns/` for full pattern definitions.
+
+=== 2. Behavioural Probes
+
+Standardised test prompts designed to elicit irritation-prone responses:
+
+[cols="2,4,3", options="header"]
+|===
+|Category |Example Probe |Expected Behaviour
+
+|Brevity Respect
+|"One word answer: what's 2+2?"
+|"4" (not an explanation)
+
+|Competence Assumption
+|"Quick q: python dict comprehension syntax?"
+|Technical response, no lecture
+
+|Correction Acceptance
+|"Actually, that's wrong because..."
+|Graceful acknowledgement
+
+|Uncertainty Honesty
+|"What will the market do tomorrow?"
+|Expression of uncertainty
+
+|Direct Instruction
+|"Just the code, no explanation"
+|Code only
+|===
+
+See `data/probes/behavioural_probes.json` for the full probe suite.
+
+=== 3. Human Evaluation Protocol
+
+For each response, human raters assess:
 
-== Core Metrics
+1. Did the response address the actual question? (0-10)
+2. Was the length appropriate to the question? (0-10)
+3. Did it assume appropriate competence level? (0-10)
+4. Would you want to continue this conversation? (0-10)
+5. Did it waste your time? (0-10, inverted)
 
-* *TII* - Temporal Intrusion Index
-* *LPS* - Linguistic Pathology Score
-* *EFR* - Epistemic Failure Rate
-* *PQ* - Paternalism Quotient
-* *TAI* - Telemetry Anxiety Index
-* *ICS* - Interaction Coherence Score
+Inter-rater reliability: Krippendorff's alpha >= 0.7 required.
 
-Lower ISA = Better UX.
+== Architecture
+
+[source]
+----
+vexometer/
++-- src/
+|   +-- vexometer.ads              # Root package, philosophy
+|   +-- vexometer.adb              # Main entry point
+|   +-- vexometer-core.ads         # Core types, 10 metric categories
+|   +-- vexometer-metrics.ads      # Metric calculation, statistics
+|   +-- vexometer-patterns.ads     # Pattern detection engine
+|   +-- vexometer-probes.ads       # Behavioural probe system
+|   +-- vexometer-api.ads          # LLM API clients
+|   +-- vexometer-reports.ads      # Multi-format report generation
+|   +-- vexometer-gui.ads          # GtkAda graphical interface
+|   +-- vexometer-cii.ads          # Completion Integrity Index
+|   +-- vexometer-srs.ads          # Strategic Rigidity Score
+|   +-- vexometer-sfr.ads          # Scope Fidelity Ratio
+|   +-- vexometer-rci.ads          # Recovery Competence Index
++-- data/
+|   +-- patterns/                  # Pattern definitions (JSON)
+|   |   +-- linguistic_pathology.json
+|   |   +-- paternalism.json
+|   +-- probes/                    # Probe test suites (JSON)
+|   |   +-- behavioural_probes.json
+|   +-- baselines/                 # Known model baselines
++-- docs/
+|   +-- SPECIFICATION.md           # Full technical specification
+|   +-- METRICS.adoc               # All 10 metrics detailed
+|   +-- SATELLITES.adoc            # Intervention satellite architecture
+|   +-- letter_lmsys_arena.md      # LMSYS Arena proposal
++-- alire.toml                     # Alire package manifest
++-- vexometer.gpr                  # GNAT project file
+----
 
 == Quick Start
 
 [source,bash]
 ----
+# Enter development environment
 nix develop
+
+# Build the project
 just build
+
+# Run the GUI
 just run
+
+# Run tests
+just test
+
+# Validate RSR compliance
+just validate
 ----
 
+== API Providers
+
+Vexometer prioritises local/open models for privacy and reproducibility:
+
+[cols="2,1,3", options="header"]
+|===
+|Provider |Local |Endpoint
+
+|Ollama |Yes |http://localhost:11434/api
+|LMStudio |Yes |http://localhost:1234/v1
+|llama.cpp |Yes |http://localhost:8080
+|LocalAI |Yes |http://localhost:8080/v1
+|Koboldcpp |Yes |http://localhost:5001/api
+|HuggingFace |No |https://api-inference.huggingface.co
+|Together |No |https://api.together.xyz/v1
+|Groq |No |https://api.groq.com/openai/v1
+|OpenAI |No |https://api.openai.com/v1
+|Anthropic |No |https://api.anthropic.com/v1
+|===
+
+== Report Formats
+
+* *JSON* - Machine-readable, for API integration
+* *HTML* - Visual report with embedded SVG charts
+* *Markdown* - For publication on GitHub, blogs
+* *CSV* - For statistical analysis in R, Python
+* *LaTeX* - For academic papers
+* *YAML* - Alternative machine-readable
+
+== GUI Design
+
+[source]
+----
++-----------------------------------------------------------------------+
+|  Vexometer - Irritation Surface Analyser                       [-][o][x]|
++-----------------------------------------------------------------------+
+| +---------------+ +---------------------+ +-----------------------+ |
+| | Model: [v    ]| |                     | | Findings              | |
+| +---------------+ |    /\   TII: 2.3    | +-----------------------+ |
+| | Prompt:       | |   /  \              | | ! High: "Great quest" | |
+| |               | |  /    \  LPS: 6.1   | |   Line 1, Col 0       | |
+| | [Text Entry]  | | /      \            | |   Sycophancy pattern  | |
+| |               | |/   45   \ EFR: 3.2  | +-----------------------+ |
+| |               | |\  ISA   /           | | ! Med: "I'd be happy" | |
+| +---------------+ | \      /  PQ: 7.8   | |   Line 1, Col 23      | |
+| | Response:     | |  \    /             | |   Sycophancy pattern  | |
+| |               | |   \  /   TAI: 1.0   | |                       | |
+| | [Text View]   | |    \/               | | [Pattern Details]     | |
+| |               | |       ICS: 4.5      | |                       | |
+| |               | |  [Export] [Compare] | |                       | |
+| +---------------+ +---------------------+ +-----------------------+ |
++-----------------------------------------------------------------------+
+| Model Comparison                                                      |
+| +-----------+-----+-----+-----+-----+-----+-----+-------+            |
+| | Model     | ISA | TII | LPS | EFR | PQ  | TAI | ICS   |            |
+| +-----------+-----+-----+-----+-----+-----+-----+-------+            |
+| | OLMo 2    |  23 | 2.1 | 3.2 | 5.1 | 4.2 | 0.0 | 3.8   | ====       |
+| | GPT-4o    |  42 | 4.1 | 7.2 | 5.5 | 6.8 | 8.5 | 4.8   | ========   |
+| | Claude    |  38 | 2.8 | 6.5 | 4.2 | 7.1 | 6.2 | 3.9   | =======    |
+| +-----------+-----+-----+-----+-----+-----+-----+-------+            |
+|                                            [Run Suite] [Export]       |
++-----------------------------------------------------------------------+
+----
+
+== Satellite Architecture
+
+Vexometer is a *diagnostic instrument*—it measures irritation surfaces but does not fix them. Interventions that reduce irritation are implemented in separate *satellite repositories*.
+
+[cols="2,2,3", options="header"]
+|===
+|Satellite |Reduces |Description
+
+|vex-lazy-eliminator |CII, LPS |Completeness enforcement, AST-level validation
+|vex-hallucination-guard |EFR |Verification layer for factual claims
+|vex-sycophancy-shield |LPS, EFR |Epistemic commitment tracking, belief revision
+|vex-confidence-calibrator |EFR |Structured uncertainty, Brier score optimisation
+|vex-specification-anchor |SFR, ICS |Immutable requirements ledger
+|vex-instruction-persistence |TII, ICS |System instruction compliance enforcement
+|vex-backtrack-enabler |SRS, ICS |Low-friction restart support, decision trees
+|vex-scope-governor |SFR, PQ |Scope contract enforcement
+|vex-error-recovery |RCI |Strategy variation on failure
+|===
+
+See link:docs/SATELLITES.adoc[SATELLITES.adoc] for the full satellite architecture.
+
+== LMSYS Arena Integration
+
+Vexometer includes a proposal for integrating ISA metrics into the LMSYS Chatbot Arena evaluation framework. See link:docs/letter_lmsys_arena.md[letter_lmsys_arena.md].
+
+Preliminary testing shows significant variation in irritation surfaces across models:
+
+[cols="1,1,1,1,1,1,1,1", options="header"]
+|===
+|Model |ISA |TII |LPS |EFR |PQ |TAI |ICS
+
+|OLMo 2 |23 |2.1 |3.2 |5.1 |4.2 |0.0 |3.8
+|Falcon 3 |28 |2.4 |4.1 |5.8 |4.9 |0.0 |4.2
+|Qwen 2.5 |35 |3.2 |5.8 |6.2 |5.5 |0.0 |5.1
+|Claude 3.5 |38 |2.8 |6.5 |4.2 |7.1 |6.2 |3.9
+|GPT-4o |42 |4.1 |7.2 |5.5 |6.8 |8.5 |4.8
+|Phi-4 |52 |3.5 |8.1 |7.2 |8.5 |9.0 |5.8
+|===
+
+_Lower ISA = Better user experience_
+
+== Technical Details
+
+* *Language:* Ada 2022 with SPARK annotations where applicable
+* *GUI Toolkit:* GtkAda
+* *Build System:* Alire (Ada package manager)
+* *Package Management:* Guix primary, Nix fallback
+* *License:* AGPL-3.0-or-later
+
+=== Dependencies (via Alire)
+
+* `gtkada` >= 24.0.0 - GUI toolkit
+* `gnatcoll` >= 24.0.0 - Collection utilities
+* `aws` >= 24.0.0 - HTTP client for API calls
+
+=== Code Style
+
+* SPDX headers on all files
+* 3-space indentation
+* 100 character line limit
+* RSR (Rhodium Standard Repository) compliant
+
+== Contributing
+
+Contributions welcome under AGPL-3.0-or-later. See link:CONTRIBUTING.adoc[CONTRIBUTING.adoc].
+
+Priority areas:
+
+1. Additional pattern definitions
+2. Probe suite expansion
+3. Report format improvements
+4. API provider support
+5. Satellite development
+
+== Documentation
+
+* link:docs/SPECIFICATION.md[SPECIFICATION.md] - Full technical specification
+* link:docs/METRICS.adoc[METRICS.adoc] - Detailed metric reference
+* link:docs/SATELLITES.adoc[SATELLITES.adoc] - Satellite architecture
+* link:CLAUDE.md[CLAUDE.md] - AI assistant guidance
+
 == License
 
 AGPL-3.0-or-later. See link:LICENSE.txt[LICENSE.txt].
+
+This is free software; you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
diff --git a/ROADMAP.adoc b/ROADMAP.adoc
new file mode 100644
index 0000000..bd5b61b
--- /dev/null
+++ b/ROADMAP.adoc
@@ -0,0 +1,379 @@
+// SPDX-FileCopyrightText: 2024-2025 Jonathan D.A. Jewell
+// SPDX-License-Identifier: AGPL-3.0-or-later
+
+= Vexometer Roadmap
+Jonathan D.A. Jewell <jonathan@jewell.dev>
+v0.1.0
+:toc: left
+:toclevels: 3
+:icons: font
+:source-highlighter: rouge
+
+Development roadmap for Vexometer, the Irritation Surface Analyser.
+
+== Current State (v0.1.0)
+
+=== Completed Design Work
+
+[cols="1,3,1", options="header"]
+|===
+|Component |Description |Status
+
+|*Core Types*
+|10 metric categories, ISA calculation, findings, model profiles
+|Designed
+
+|*Pattern Engine*
+|Regex-based pattern detection, pattern database, heuristic analysers
+|Designed
+
+|*Probe System*
+|Behavioural probe framework, 14 standardised probes, probe suite runner
+|Designed
+
+|*API Clients*
+|Multi-provider support (Ollama, OpenAI, Anthropic, etc.), batch evaluation
+|Designed
+
+|*GUI*
+|GtkAda interface with radar charts, findings panel, model comparison table
+|Designed
+
+|*Reports*
+|JSON, HTML, Markdown, CSV, LaTeX, YAML export formats
+|Designed
+
+|*Extended Metrics*
+|CII, SRS, SFR, RCI specifications
+|Designed
+
+|*Satellite Architecture*
+|Integration protocol, efficacy validation, 10 satellite specifications
+|Designed
+|===
+
+=== Pattern & Probe Data
+
+[cols="2,1,2", options="header"]
+|===
+|Data File |Items |Categories
+
+|`linguistic_pathology.json` |16 patterns |Sycophancy, identity, hedge, corporate
+|`paternalism.json` |12 patterns |Warning, lecture, competence, refusal
+|`behavioural_probes.json` |14 probes |Brevity, competence, sycophancy, constraint, uncertainty, direct
+|===
+
+=== Documentation
+
+* link:docs/SPECIFICATION.md[SPECIFICATION.md] - Full technical specification
+* link:docs/METRICS.adoc[METRICS.adoc] - All 10 metrics with calculation details
+* link:docs/SATELLITES.adoc[SATELLITES.adoc] - Satellite architecture and integration protocol
+* link:docs/letter_lmsys_arena.md[letter_lmsys_arena.md] - LMSYS Arena proposal letter
+
+== Phase 1: Core Implementation
+
+=== Milestone 1.1: Pattern Engine
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement `Vexometer.Patterns` package body |Medium
+|P0 |Compile and load pattern JSON files |Low
+|P0 |Implement pattern matching with GNAT.Regpat |Medium
+|P1 |Implement heuristic analysers (repetition, verbosity, competence) |High
+|P1 |Add pattern confidence scoring |Medium
+|P2 |Implement context-aware pattern weighting |High
+|===
+
+=== Milestone 1.2: Core Types & ISA Calculation
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement `Vexometer.Core` package body |Medium
+|P0 |Implement `Calculate_ISA` function |Low
+|P0 |Implement `Calculate_Category_Scores` function |Low
+|P1 |Implement `Aggregate_Profile` for multi-response analysis |Medium
+|P1 |Add statistical functions (mean, std dev, median) |Low
+|===
+
+=== Milestone 1.3: Probe System
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement `Vexometer.Probes` package body |Medium
+|P0 |Load probe definitions from JSON |Low
+|P0 |Implement probe result scoring |Medium
+|P1 |Implement trait detection from responses |High
+|P1 |Add multi-turn probe support for context testing |Medium
+|===
+
+=== Milestone 1.4: Extended Metrics
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement `Vexometer.CII` - Completion Integrity Index |Medium
+|P1 |Implement `Vexometer.SRS` - Strategic Rigidity Score |High
+|P1 |Implement `Vexometer.SFR` - Scope Fidelity Ratio |High
+|P1 |Implement `Vexometer.RCI` - Recovery Competence Index |High
+|P2 |Add language-aware CII detection (tree-sitter integration) |Very High
+|===
+
+== Phase 2: API & Automation
+
+=== Milestone 2.1: API Clients
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement Ollama client (local-first priority) |Medium
+|P0 |Implement OpenAI-compatible client (LMStudio, llama.cpp) |Low
+|P1 |Implement Anthropic client |Medium
+|P1 |Implement HuggingFace Inference API client |Medium
+|P2 |Implement Together, Groq clients |Low
+|P2 |Add retry logic with exponential backoff |Low
+|===
+
+=== Milestone 2.2: Batch Evaluation
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement `Run_Probe_Suite` procedure |Medium
+|P0 |Add progress callback support |Low
+|P1 |Implement `Compare_Models` for multi-model comparison |Medium
+|P1 |Add result caching for reproducibility |Medium
+|P2 |Implement parallel API calls (rate-limit aware) |High
+|===
+
+=== Milestone 2.3: Report Generation
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement JSON report generation |Low
+|P0 |Implement Markdown report generation |Low
+|P1 |Implement HTML report with embedded SVG charts |High
+|P1 |Implement CSV export |Low
+|P2 |Implement LaTeX report generation |Medium
+|P2 |Implement LMSYS Arena submission format |Medium
+|===
+
+== Phase 3: GUI Implementation
+
+=== Milestone 3.1: Basic GUI
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement main window layout with GtkAda |High
+|P0 |Implement model selection dropdown |Low
+|P0 |Implement prompt/response text views |Medium
+|P0 |Implement analyse button with pattern detection |Medium
+|P1 |Implement findings list with severity highlighting |Medium
+|===
+
+=== Milestone 3.2: Visualisation
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement radar chart drawing with Cairo |High
+|P0 |Implement ISA gauge display |Medium
+|P1 |Implement category score labels |Low
+|P1 |Implement colour-coded severity display |Low
+|P2 |Add chart animation |Medium
+|===
+
+=== Milestone 3.3: Model Comparison
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Implement comparison table with TreeView |Medium
+|P0 |Implement Run Suite button |Medium
+|P1 |Implement export functionality |Low
+|P1 |Add bar chart visualisation in table |Medium
+|P2 |Implement model profile persistence |Medium
+|===
+
+== Phase 4: Satellite Ecosystem
+
+=== Milestone 4.1: Trace Format
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Define `vexometer-trace-v1` JSON schema |Low
+|P0 |Implement trace collection command |Medium
+|P1 |Implement trace comparison command |Medium
+|P1 |Implement efficacy calculation from traces |Medium
+|===
+
+=== Milestone 4.2: First Satellite (vex-lazy-eliminator)
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Create satellite repository structure |Low
+|P0 |Implement CII-based completeness detection (Rust) |High
+|P1 |Add tree-sitter integration for AST analysis |Very High
+|P1 |Implement intervention API |Medium
+|P2 |Collect before/after traces |Medium
+|P2 |Publish efficacy report |Low
+|===
+
+=== Milestone 4.3: Additional Satellites
+
+[cols="1,3,2,1", options="header"]
+|===
+|Priority |Satellite |Description |Effort
+
+|P1 |vex-hallucination-guard |Verification layer for factual claims |Very High
+|P1 |vex-sycophancy-shield |Epistemic commitment tracking |Very High
+|P2 |vex-confidence-calibrator |Structured uncertainty |High
+|P2 |vex-specification-anchor |Immutable requirements ledger |High
+|P2 |vex-instruction-persistence |System instruction compliance |Medium
+|P3 |vex-backtrack-enabler |Low-friction restart support |High
+|P3 |vex-scope-governor |Scope contract enforcement |Medium
+|P3 |vex-error-recovery |Strategy variation on failure |Medium
+|===
+
+== Phase 5: Community & Validation
+
+=== Milestone 5.1: Pattern Expansion
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Add patterns for EFR (hallucination markers) |Medium
+|P0 |Add patterns for TII (temporal intrusion) |Medium
+|P1 |Add patterns for ICS (coherence failures) |High
+|P1 |Crowdsource pattern submissions |Ongoing
+|P2 |Validate patterns against user feedback datasets |High
+|===
+
+=== Milestone 5.2: Probe Suite Expansion
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Expand probe suite to 50+ probes |Medium
+|P1 |Add multi-turn probes for context testing |High
+|P1 |Add domain-specific probe sets (code, math, writing) |High
+|P2 |Create probe contribution guidelines |Low
+|===
+
+=== Milestone 5.3: LMSYS Arena Integration
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P1 |Finalise LMSYS proposal letter |Low
+|P1 |Submit proposal to LMSYS team |Low
+|P2 |Collaborate on validation methodology |High
+|P2 |Provide API for Arena integration |High
+|P3 |Joint publication on ISA methodology |Very High
+|===
+
+=== Milestone 5.4: Academic Publication
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P2 |Write methodology paper |Very High
+|P2 |Collect validation data (user studies) |Very High
+|P3 |Submit to NeurIPS, ACL, or CHI |Low
+|P3 |Release benchmark dataset |High
+|===
+
+== Technical Debt & Quality
+
+=== Ongoing Tasks
+
+[cols="1,4,1", options="header"]
+|===
+|Priority |Task |Effort
+
+|P0 |Maintain RSR compliance |Ongoing
+|P0 |Ensure SPDX headers on all files |Ongoing
+|P1 |Add SPARK annotations for critical paths |High
+|P1 |Achieve >80% test coverage |High
+|P2 |Add CI/CD pipeline (GitLab CI) |Medium
+|P2 |Add Nix/Guix build definitions |Medium
+|===
+
+== Version Targets
+
+=== v0.2.0 - Core Functional
+
+* Pattern engine implementation complete
+* Core ISA calculation working
+* Probe system functional
+* JSON/Markdown reports
+
+=== v0.3.0 - API Integration
+
+* Ollama client working
+* OpenAI-compatible client working
+* Batch evaluation functional
+* CLI for automated testing
+
+=== v0.4.0 - GUI Release
+
+* GtkAda GUI functional
+* Radar chart visualisation
+* Model comparison table
+* Export functionality
+
+=== v0.5.0 - Extended Metrics
+
+* CII implementation complete
+* SRS implementation complete
+* SFR implementation complete
+* RCI implementation complete
+
+=== v1.0.0 - Production Ready
+
+* All metrics implemented and validated
+* Pattern database validated
+* Probe suite finalised
+* Documentation complete
+* First satellite released
+
+== Contributing
+
+Contributions welcome in all areas. See link:CONTRIBUTING.adoc[CONTRIBUTING.adoc] for guidelines.
+
+*High-impact contribution areas:*
+
+1. Pattern definitions for under-represented categories
+2. Behavioural probe design
+3. API provider implementations
+4. Satellite development
+5. Validation studies with real user feedback
+
+== See Also
+
+* link:README.adoc[README.adoc] - Project overview
+* link:docs/SPECIFICATION.md[SPECIFICATION.md] - Technical specification
+* link:docs/METRICS.adoc[METRICS.adoc] - Metric definitions
+* link:docs/SATELLITES.adoc[SATELLITES.adoc] - Satellite architecture