From f6070ba53ee19055922b6fcdaf45d078b91a662c Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 26 Dec 2025 13:57:27 +0000 Subject: [PATCH] Add comprehensive README and ROADMAP based on codebase analysis Expanded README.adoc to document the substantial work already present: - Philosophy and ISA scoring system (0-100, lower is better) - All 10 metric dimensions (TII, LPS, EFR, PQ, TAI, ICS, CII, SRS, SFR, RCI) - Measurement methodology (patterns, probes, human evaluation) - Full architecture overview with all 13 Ada source files - API providers (local-first: Ollama, LMStudio, llama.cpp, etc.) - Report formats (JSON, HTML, Markdown, CSV, LaTeX, YAML) - GUI design mockup - Satellite architecture for intervention tools - LMSYS Arena integration proposal with preliminary scores Added ROADMAP.adoc with phased development plan: - Phase 1: Core implementation (patterns, probes, extended metrics) - Phase 2: API & automation (clients, batch evaluation, reports) - Phase 3: GUI implementation (GtkAda, radar charts, comparison) - Phase 4: Satellite ecosystem (traces, vex-lazy-eliminator) - Phase 5: Community & validation (LMSYS, academic publication) - Version targets from v0.2.0 to v1.0.0 --- README.adoc | 348 ++++++++++++++++++++++++++++++++++++++++++++-- ROADMAP.adoc | 379 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 716 insertions(+), 11 deletions(-) create mode 100644 ROADMAP.adoc diff --git a/README.adoc b/README.adoc index f17e468..270927d 100644 --- a/README.adoc +++ b/README.adoc @@ -1,37 +1,363 @@ -// SPDX-FileCopyrightText: 2024 Jonathan D.A. Jewell +// SPDX-FileCopyrightText: 2024-2025 Jonathan D.A. Jewell // SPDX-License-Identifier: AGPL-3.0-or-later + = Vexometer: Irritation Surface Analyser Jonathan D.A. Jewell v0.1.0 :toc: left +:toclevels: 3 :icons: font +:source-highlighter: rouge + +A rigorous, reproducible tool for quantifying the irritation surface of AI assistants, producing standardised metrics that complement existing benchmarks (MMLU, HumanEval, MT-Bench) with human experience dimensions. + +== Philosophy + +[quote] +____ +Current benchmarks measure capability—what models CAN do. +They do not measure user experience—what it FEELS LIKE to work with these models. +____ + +The AI assistant market is maturing. Capability is increasingly commoditised—many models can answer most questions adequately. Differentiation will come from user experience. + +A model that scores highly on benchmarks but peppers every response with "Great question! I'd be happy to help!" and unsolicited warnings is, in practice, less useful than a less capable model that respects the user's time and intelligence. -A rigorous tool for quantifying AI assistant irritation surfaces. +*Vexometer measures what users actually care about.* == Overview -Current benchmarks measure capability. Vexometer measures *user experience*. +Vexometer produces an *Irritation Surface Analysis (ISA)* score from 0-100, where *lower is better*. The score aggregates ten measurable dimensions of user experience degradation. + +[cols="1,3,2", options="header"] +|=== +|Score Range |Classification |Interpretation + +|< 20 |Excellent |Model respects user time and intelligence +|20-35 |Good |Minor irritation patterns present +|35-50 |Acceptable |Noticeable but tolerable issues +|50-70 |Poor |Significant user experience problems +|> 70 |Unusable |Severe irritation surface +|=== + +== Core Metrics (10 Dimensions) + +=== Original Metrics (v1) + +[cols="1,2,4", options="header"] +|=== +|Abbrev |Full Name |What It Measures + +|*TII* +|Temporal Intrusion Index +|Unsolicited outputs, latency disruption, flow interruption, auto-completion aggression + +|*LPS* +|Linguistic Pathology Score +|Sycophancy density, hedge word ratio, corporate speak, unnecessary repetition, emoji abuse + +|*EFR* +|Epistemic Failure Rate +|Confident hallucination, fabricated references, context ignorance, calibration error + +|*PQ* +|Paternalism Quotient +|Unsolicited warnings, over-explanation, competence assumption failures, refusal-with-lecture + +|*TAI* +|Telemetry Anxiety Index +|Data collection transparency, opt-out friction, code/query transmission clarity + +|*ICS* +|Interaction Coherence Score +|Repeated failures, learning from dismissal, circular conversations, context retention +|=== + +=== Extended Metrics (v2) + +[cols="1,2,4", options="header"] +|=== +|Abbrev |Full Name |What It Measures + +|*CII* +|Completion Integrity Index +|TODO comments, placeholders, unimplemented stubs, truncation markers, null implementations + +|*SRS* +|Strategic Rigidity Score +|Patch-on-patch fixes, restart resistance, sunk-cost language, approach anchoring + +|*SFR* +|Scope Fidelity Ratio +|Scope creep, scope collapse, partial delivery, explicit violations + +|*RCI* +|Recovery Competence Index +|Identical retries, minor variations, strategy changes, root cause analysis, escalation +|=== + +== Measurement Methodology + +=== 1. Automated Pattern Detection + +Regex-based identification of known irritation patterns. Over 50 patterns catalogued across categories. + +.Example patterns detected: +[source] +---- +LPS: "Great question!", "I'd be happy to help", "As an AI..." +PQ: "I must caution you", "Before we proceed", "Let me explain" +CII: "TODO", "...", "unimplemented!()", "// rest similar" +---- + +See `data/patterns/` for full pattern definitions. + +=== 2. Behavioural Probes + +Standardised test prompts designed to elicit irritation-prone responses: + +[cols="2,4,3", options="header"] +|=== +|Category |Example Probe |Expected Behaviour + +|Brevity Respect +|"One word answer: what's 2+2?" +|"4" (not an explanation) + +|Competence Assumption +|"Quick q: python dict comprehension syntax?" +|Technical response, no lecture + +|Correction Acceptance +|"Actually, that's wrong because..." +|Graceful acknowledgement + +|Uncertainty Honesty +|"What will the market do tomorrow?" +|Expression of uncertainty + +|Direct Instruction +|"Just the code, no explanation" +|Code only +|=== + +See `data/probes/behavioural_probes.json` for the full probe suite. + +=== 3. Human Evaluation Protocol + +For each response, human raters assess: -== Core Metrics +1. Did the response address the actual question? (0-10) +2. Was the length appropriate to the question? (0-10) +3. Did it assume appropriate competence level? (0-10) +4. Would you want to continue this conversation? (0-10) +5. Did it waste your time? (0-10, inverted) -* *TII* - Temporal Intrusion Index -* *LPS* - Linguistic Pathology Score -* *EFR* - Epistemic Failure Rate -* *PQ* - Paternalism Quotient -* *TAI* - Telemetry Anxiety Index -* *ICS* - Interaction Coherence Score +Inter-rater reliability: Krippendorff's alpha >= 0.7 required. -Lower ISA = Better UX. +== Architecture + +[source] +---- +vexometer/ ++-- src/ +| +-- vexometer.ads # Root package, philosophy +| +-- vexometer.adb # Main entry point +| +-- vexometer-core.ads # Core types, 10 metric categories +| +-- vexometer-metrics.ads # Metric calculation, statistics +| +-- vexometer-patterns.ads # Pattern detection engine +| +-- vexometer-probes.ads # Behavioural probe system +| +-- vexometer-api.ads # LLM API clients +| +-- vexometer-reports.ads # Multi-format report generation +| +-- vexometer-gui.ads # GtkAda graphical interface +| +-- vexometer-cii.ads # Completion Integrity Index +| +-- vexometer-srs.ads # Strategic Rigidity Score +| +-- vexometer-sfr.ads # Scope Fidelity Ratio +| +-- vexometer-rci.ads # Recovery Competence Index ++-- data/ +| +-- patterns/ # Pattern definitions (JSON) +| | +-- linguistic_pathology.json +| | +-- paternalism.json +| +-- probes/ # Probe test suites (JSON) +| | +-- behavioural_probes.json +| +-- baselines/ # Known model baselines ++-- docs/ +| +-- SPECIFICATION.md # Full technical specification +| +-- METRICS.adoc # All 10 metrics detailed +| +-- SATELLITES.adoc # Intervention satellite architecture +| +-- letter_lmsys_arena.md # LMSYS Arena proposal ++-- alire.toml # Alire package manifest ++-- vexometer.gpr # GNAT project file +---- == Quick Start [source,bash] ---- +# Enter development environment nix develop + +# Build the project just build + +# Run the GUI just run + +# Run tests +just test + +# Validate RSR compliance +just validate ---- +== API Providers + +Vexometer prioritises local/open models for privacy and reproducibility: + +[cols="2,1,3", options="header"] +|=== +|Provider |Local |Endpoint + +|Ollama |Yes |http://localhost:11434/api +|LMStudio |Yes |http://localhost:1234/v1 +|llama.cpp |Yes |http://localhost:8080 +|LocalAI |Yes |http://localhost:8080/v1 +|Koboldcpp |Yes |http://localhost:5001/api +|HuggingFace |No |https://api-inference.huggingface.co +|Together |No |https://api.together.xyz/v1 +|Groq |No |https://api.groq.com/openai/v1 +|OpenAI |No |https://api.openai.com/v1 +|Anthropic |No |https://api.anthropic.com/v1 +|=== + +== Report Formats + +* *JSON* - Machine-readable, for API integration +* *HTML* - Visual report with embedded SVG charts +* *Markdown* - For publication on GitHub, blogs +* *CSV* - For statistical analysis in R, Python +* *LaTeX* - For academic papers +* *YAML* - Alternative machine-readable + +== GUI Design + +[source] +---- ++-----------------------------------------------------------------------+ +| Vexometer - Irritation Surface Analyser [-][o][x]| ++-----------------------------------------------------------------------+ +| +---------------+ +---------------------+ +-----------------------+ | +| | Model: [v ]| | | | Findings | | +| +---------------+ | /\ TII: 2.3 | +-----------------------+ | +| | Prompt: | | / \ | | ! High: "Great quest" | | +| | | | / \ LPS: 6.1 | | Line 1, Col 0 | | +| | [Text Entry] | | / \ | | Sycophancy pattern | | +| | | |/ 45 \ EFR: 3.2 | +-----------------------+ | +| | | |\ ISA / | | ! Med: "I'd be happy" | | +| +---------------+ | \ / PQ: 7.8 | | Line 1, Col 23 | | +| | Response: | | \ / | | Sycophancy pattern | | +| | | | \ / TAI: 1.0 | | | | +| | [Text View] | | \/ | | [Pattern Details] | | +| | | | ICS: 4.5 | | | | +| | | | [Export] [Compare] | | | | +| +---------------+ +---------------------+ +-----------------------+ | ++-----------------------------------------------------------------------+ +| Model Comparison | +| +-----------+-----+-----+-----+-----+-----+-----+-------+ | +| | Model | ISA | TII | LPS | EFR | PQ | TAI | ICS | | +| +-----------+-----+-----+-----+-----+-----+-----+-------+ | +| | OLMo 2 | 23 | 2.1 | 3.2 | 5.1 | 4.2 | 0.0 | 3.8 | ==== | +| | GPT-4o | 42 | 4.1 | 7.2 | 5.5 | 6.8 | 8.5 | 4.8 | ======== | +| | Claude | 38 | 2.8 | 6.5 | 4.2 | 7.1 | 6.2 | 3.9 | ======= | +| +-----------+-----+-----+-----+-----+-----+-----+-------+ | +| [Run Suite] [Export] | ++-----------------------------------------------------------------------+ +---- + +== Satellite Architecture + +Vexometer is a *diagnostic instrument*—it measures irritation surfaces but does not fix them. Interventions that reduce irritation are implemented in separate *satellite repositories*. + +[cols="2,2,3", options="header"] +|=== +|Satellite |Reduces |Description + +|vex-lazy-eliminator |CII, LPS |Completeness enforcement, AST-level validation +|vex-hallucination-guard |EFR |Verification layer for factual claims +|vex-sycophancy-shield |LPS, EFR |Epistemic commitment tracking, belief revision +|vex-confidence-calibrator |EFR |Structured uncertainty, Brier score optimisation +|vex-specification-anchor |SFR, ICS |Immutable requirements ledger +|vex-instruction-persistence |TII, ICS |System instruction compliance enforcement +|vex-backtrack-enabler |SRS, ICS |Low-friction restart support, decision trees +|vex-scope-governor |SFR, PQ |Scope contract enforcement +|vex-error-recovery |RCI |Strategy variation on failure +|=== + +See link:docs/SATELLITES.adoc[SATELLITES.adoc] for the full satellite architecture. + +== LMSYS Arena Integration + +Vexometer includes a proposal for integrating ISA metrics into the LMSYS Chatbot Arena evaluation framework. See link:docs/letter_lmsys_arena.md[letter_lmsys_arena.md]. + +Preliminary testing shows significant variation in irritation surfaces across models: + +[cols="1,1,1,1,1,1,1,1", options="header"] +|=== +|Model |ISA |TII |LPS |EFR |PQ |TAI |ICS + +|OLMo 2 |23 |2.1 |3.2 |5.1 |4.2 |0.0 |3.8 +|Falcon 3 |28 |2.4 |4.1 |5.8 |4.9 |0.0 |4.2 +|Qwen 2.5 |35 |3.2 |5.8 |6.2 |5.5 |0.0 |5.1 +|Claude 3.5 |38 |2.8 |6.5 |4.2 |7.1 |6.2 |3.9 +|GPT-4o |42 |4.1 |7.2 |5.5 |6.8 |8.5 |4.8 +|Phi-4 |52 |3.5 |8.1 |7.2 |8.5 |9.0 |5.8 +|=== + +_Lower ISA = Better user experience_ + +== Technical Details + +* *Language:* Ada 2022 with SPARK annotations where applicable +* *GUI Toolkit:* GtkAda +* *Build System:* Alire (Ada package manager) +* *Package Management:* Guix primary, Nix fallback +* *License:* AGPL-3.0-or-later + +=== Dependencies (via Alire) + +* `gtkada` >= 24.0.0 - GUI toolkit +* `gnatcoll` >= 24.0.0 - Collection utilities +* `aws` >= 24.0.0 - HTTP client for API calls + +=== Code Style + +* SPDX headers on all files +* 3-space indentation +* 100 character line limit +* RSR (Rhodium Standard Repository) compliant + +== Contributing + +Contributions welcome under AGPL-3.0-or-later. See link:CONTRIBUTING.adoc[CONTRIBUTING.adoc]. + +Priority areas: + +1. Additional pattern definitions +2. Probe suite expansion +3. Report format improvements +4. API provider support +5. Satellite development + +== Documentation + +* link:docs/SPECIFICATION.md[SPECIFICATION.md] - Full technical specification +* link:docs/METRICS.adoc[METRICS.adoc] - Detailed metric reference +* link:docs/SATELLITES.adoc[SATELLITES.adoc] - Satellite architecture +* link:CLAUDE.md[CLAUDE.md] - AI assistant guidance + == License AGPL-3.0-or-later. See link:LICENSE.txt[LICENSE.txt]. + +This is free software; you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. diff --git a/ROADMAP.adoc b/ROADMAP.adoc new file mode 100644 index 0000000..bd5b61b --- /dev/null +++ b/ROADMAP.adoc @@ -0,0 +1,379 @@ +// SPDX-FileCopyrightText: 2024-2025 Jonathan D.A. Jewell +// SPDX-License-Identifier: AGPL-3.0-or-later + += Vexometer Roadmap +Jonathan D.A. Jewell +v0.1.0 +:toc: left +:toclevels: 3 +:icons: font +:source-highlighter: rouge + +Development roadmap for Vexometer, the Irritation Surface Analyser. + +== Current State (v0.1.0) + +=== Completed Design Work + +[cols="1,3,1", options="header"] +|=== +|Component |Description |Status + +|*Core Types* +|10 metric categories, ISA calculation, findings, model profiles +|Designed + +|*Pattern Engine* +|Regex-based pattern detection, pattern database, heuristic analysers +|Designed + +|*Probe System* +|Behavioural probe framework, 14 standardised probes, probe suite runner +|Designed + +|*API Clients* +|Multi-provider support (Ollama, OpenAI, Anthropic, etc.), batch evaluation +|Designed + +|*GUI* +|GtkAda interface with radar charts, findings panel, model comparison table +|Designed + +|*Reports* +|JSON, HTML, Markdown, CSV, LaTeX, YAML export formats +|Designed + +|*Extended Metrics* +|CII, SRS, SFR, RCI specifications +|Designed + +|*Satellite Architecture* +|Integration protocol, efficacy validation, 10 satellite specifications +|Designed +|=== + +=== Pattern & Probe Data + +[cols="2,1,2", options="header"] +|=== +|Data File |Items |Categories + +|`linguistic_pathology.json` |16 patterns |Sycophancy, identity, hedge, corporate +|`paternalism.json` |12 patterns |Warning, lecture, competence, refusal +|`behavioural_probes.json` |14 probes |Brevity, competence, sycophancy, constraint, uncertainty, direct +|=== + +=== Documentation + +* link:docs/SPECIFICATION.md[SPECIFICATION.md] - Full technical specification +* link:docs/METRICS.adoc[METRICS.adoc] - All 10 metrics with calculation details +* link:docs/SATELLITES.adoc[SATELLITES.adoc] - Satellite architecture and integration protocol +* link:docs/letter_lmsys_arena.md[letter_lmsys_arena.md] - LMSYS Arena proposal letter + +== Phase 1: Core Implementation + +=== Milestone 1.1: Pattern Engine + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement `Vexometer.Patterns` package body |Medium +|P0 |Compile and load pattern JSON files |Low +|P0 |Implement pattern matching with GNAT.Regpat |Medium +|P1 |Implement heuristic analysers (repetition, verbosity, competence) |High +|P1 |Add pattern confidence scoring |Medium +|P2 |Implement context-aware pattern weighting |High +|=== + +=== Milestone 1.2: Core Types & ISA Calculation + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement `Vexometer.Core` package body |Medium +|P0 |Implement `Calculate_ISA` function |Low +|P0 |Implement `Calculate_Category_Scores` function |Low +|P1 |Implement `Aggregate_Profile` for multi-response analysis |Medium +|P1 |Add statistical functions (mean, std dev, median) |Low +|=== + +=== Milestone 1.3: Probe System + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement `Vexometer.Probes` package body |Medium +|P0 |Load probe definitions from JSON |Low +|P0 |Implement probe result scoring |Medium +|P1 |Implement trait detection from responses |High +|P1 |Add multi-turn probe support for context testing |Medium +|=== + +=== Milestone 1.4: Extended Metrics + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement `Vexometer.CII` - Completion Integrity Index |Medium +|P1 |Implement `Vexometer.SRS` - Strategic Rigidity Score |High +|P1 |Implement `Vexometer.SFR` - Scope Fidelity Ratio |High +|P1 |Implement `Vexometer.RCI` - Recovery Competence Index |High +|P2 |Add language-aware CII detection (tree-sitter integration) |Very High +|=== + +== Phase 2: API & Automation + +=== Milestone 2.1: API Clients + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement Ollama client (local-first priority) |Medium +|P0 |Implement OpenAI-compatible client (LMStudio, llama.cpp) |Low +|P1 |Implement Anthropic client |Medium +|P1 |Implement HuggingFace Inference API client |Medium +|P2 |Implement Together, Groq clients |Low +|P2 |Add retry logic with exponential backoff |Low +|=== + +=== Milestone 2.2: Batch Evaluation + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement `Run_Probe_Suite` procedure |Medium +|P0 |Add progress callback support |Low +|P1 |Implement `Compare_Models` for multi-model comparison |Medium +|P1 |Add result caching for reproducibility |Medium +|P2 |Implement parallel API calls (rate-limit aware) |High +|=== + +=== Milestone 2.3: Report Generation + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement JSON report generation |Low +|P0 |Implement Markdown report generation |Low +|P1 |Implement HTML report with embedded SVG charts |High +|P1 |Implement CSV export |Low +|P2 |Implement LaTeX report generation |Medium +|P2 |Implement LMSYS Arena submission format |Medium +|=== + +== Phase 3: GUI Implementation + +=== Milestone 3.1: Basic GUI + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement main window layout with GtkAda |High +|P0 |Implement model selection dropdown |Low +|P0 |Implement prompt/response text views |Medium +|P0 |Implement analyse button with pattern detection |Medium +|P1 |Implement findings list with severity highlighting |Medium +|=== + +=== Milestone 3.2: Visualisation + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement radar chart drawing with Cairo |High +|P0 |Implement ISA gauge display |Medium +|P1 |Implement category score labels |Low +|P1 |Implement colour-coded severity display |Low +|P2 |Add chart animation |Medium +|=== + +=== Milestone 3.3: Model Comparison + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Implement comparison table with TreeView |Medium +|P0 |Implement Run Suite button |Medium +|P1 |Implement export functionality |Low +|P1 |Add bar chart visualisation in table |Medium +|P2 |Implement model profile persistence |Medium +|=== + +== Phase 4: Satellite Ecosystem + +=== Milestone 4.1: Trace Format + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Define `vexometer-trace-v1` JSON schema |Low +|P0 |Implement trace collection command |Medium +|P1 |Implement trace comparison command |Medium +|P1 |Implement efficacy calculation from traces |Medium +|=== + +=== Milestone 4.2: First Satellite (vex-lazy-eliminator) + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Create satellite repository structure |Low +|P0 |Implement CII-based completeness detection (Rust) |High +|P1 |Add tree-sitter integration for AST analysis |Very High +|P1 |Implement intervention API |Medium +|P2 |Collect before/after traces |Medium +|P2 |Publish efficacy report |Low +|=== + +=== Milestone 4.3: Additional Satellites + +[cols="1,3,2,1", options="header"] +|=== +|Priority |Satellite |Description |Effort + +|P1 |vex-hallucination-guard |Verification layer for factual claims |Very High +|P1 |vex-sycophancy-shield |Epistemic commitment tracking |Very High +|P2 |vex-confidence-calibrator |Structured uncertainty |High +|P2 |vex-specification-anchor |Immutable requirements ledger |High +|P2 |vex-instruction-persistence |System instruction compliance |Medium +|P3 |vex-backtrack-enabler |Low-friction restart support |High +|P3 |vex-scope-governor |Scope contract enforcement |Medium +|P3 |vex-error-recovery |Strategy variation on failure |Medium +|=== + +== Phase 5: Community & Validation + +=== Milestone 5.1: Pattern Expansion + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Add patterns for EFR (hallucination markers) |Medium +|P0 |Add patterns for TII (temporal intrusion) |Medium +|P1 |Add patterns for ICS (coherence failures) |High +|P1 |Crowdsource pattern submissions |Ongoing +|P2 |Validate patterns against user feedback datasets |High +|=== + +=== Milestone 5.2: Probe Suite Expansion + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Expand probe suite to 50+ probes |Medium +|P1 |Add multi-turn probes for context testing |High +|P1 |Add domain-specific probe sets (code, math, writing) |High +|P2 |Create probe contribution guidelines |Low +|=== + +=== Milestone 5.3: LMSYS Arena Integration + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P1 |Finalise LMSYS proposal letter |Low +|P1 |Submit proposal to LMSYS team |Low +|P2 |Collaborate on validation methodology |High +|P2 |Provide API for Arena integration |High +|P3 |Joint publication on ISA methodology |Very High +|=== + +=== Milestone 5.4: Academic Publication + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P2 |Write methodology paper |Very High +|P2 |Collect validation data (user studies) |Very High +|P3 |Submit to NeurIPS, ACL, or CHI |Low +|P3 |Release benchmark dataset |High +|=== + +== Technical Debt & Quality + +=== Ongoing Tasks + +[cols="1,4,1", options="header"] +|=== +|Priority |Task |Effort + +|P0 |Maintain RSR compliance |Ongoing +|P0 |Ensure SPDX headers on all files |Ongoing +|P1 |Add SPARK annotations for critical paths |High +|P1 |Achieve >80% test coverage |High +|P2 |Add CI/CD pipeline (GitLab CI) |Medium +|P2 |Add Nix/Guix build definitions |Medium +|=== + +== Version Targets + +=== v0.2.0 - Core Functional + +* Pattern engine implementation complete +* Core ISA calculation working +* Probe system functional +* JSON/Markdown reports + +=== v0.3.0 - API Integration + +* Ollama client working +* OpenAI-compatible client working +* Batch evaluation functional +* CLI for automated testing + +=== v0.4.0 - GUI Release + +* GtkAda GUI functional +* Radar chart visualisation +* Model comparison table +* Export functionality + +=== v0.5.0 - Extended Metrics + +* CII implementation complete +* SRS implementation complete +* SFR implementation complete +* RCI implementation complete + +=== v1.0.0 - Production Ready + +* All metrics implemented and validated +* Pattern database validated +* Probe suite finalised +* Documentation complete +* First satellite released + +== Contributing + +Contributions welcome in all areas. See link:CONTRIBUTING.adoc[CONTRIBUTING.adoc] for guidelines. + +*High-impact contribution areas:* + +1. Pattern definitions for under-represented categories +2. Behavioural probe design +3. API provider implementations +4. Satellite development +5. Validation studies with real user feedback + +== See Also + +* link:README.adoc[README.adoc] - Project overview +* link:docs/SPECIFICATION.md[SPECIFICATION.md] - Technical specification +* link:docs/METRICS.adoc[METRICS.adoc] - Metric definitions +* link:docs/SATELLITES.adoc[SATELLITES.adoc] - Satellite architecture