Skip to content

Conversation

@crtahlin
Copy link
Collaborator

This SWIP proposes Swarm as a decentralized storage layer for provenance metadata and data. Provenance, the documented history of a dataset's origin and transformations, is increasingly important for regulatory compliance, ethical AI, and data accountability. A command-line toolkit will provide utilities to:

  • Upload/Download: Store and retrieve provenance files (in any format) with Swarm reference hashes.
  • Metadata Management: Track storage validity (TTL) and extend it via stamp top-ups.

The framework does not enforce specific provenance standards but ensures compatibility by decoupling metadata (structured JSON) from the actual provenance data (stored as arbitrary files). Developers and enterprises retain full control over their data format and privacy measures.

@tamas6 tamas6 added the implementation describes standard for algorithm/data structure or design label Mar 20, 2025
@significance significance changed the title SWIP-0037 Swarm-Based Data Provenance Framework add SWIP37 Swarm-Based Data Provenance Framework Aug 10, 2025
@significance significance changed the title add SWIP37 Swarm-Based Data Provenance Framework add SWIP-37 Swarm-Based Data Provenance Framework Aug 28, 2025
- Add Gateway Service and MCP Server components to architecture
- Update architecture diagram showing layered structure (CLI/MCP -> Gateway -> Swarm)
- Add DSSC Blueprint reference for P&T building block context
- Clarify CLI as core processing component (SHA256, metadata wrapping)
- Simplify Implementation section (remove detailed module lists)
- Add integrity verification failure test case
- Remove specific performance numbers and collection upload details
@crtahlin
Copy link
Collaborator Author

Updated SWIP based on implementation experience and market evolution

This update reflects the framework as implemented, incorporating feedback from development and adapting to ecosystem changes:

  • Architecture expanded: Added Gateway Service as intermediary layer providing managed stamp lifecycle (TTL calculation, utilization tracking, health checks) and simplified API access. Also added MCP Server for AI agent integration, responding to growing demand for LLM-accessible tooling.
  • DSSC Blueprint context: Added reference to the Data Spaces Support Centre Provenance & Traceability building block, positioning Swarm as trusted third party for P&T data in European Data Spaces.
  • UX improvements documented: Gateway abstracts away Swarm complexity (PLUR amounts, depth calculations) with user-friendly parameters like duration in hours and size presets.
  • Simplified specification: Removed implementation-specific details to keep the SWIP focused on architecture and interfaces.

The core provenance workflow (SHA256 hashing, metadata wrapping, content-addressed storage) remains unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

implementation describes standard for algorithm/data structure or design

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants