A self-learning, AI-powered knowledge base that provides semantic search and RAG (Retrieval-Augmented Generation) capabilities via Slack.
This system:
- Seeds content from Confluence Cloud (one-time sync with manual rebase)
- Auto-generates metadata (topics, intents, audience) using AI
- Provides AI-powered semantic search via Slack (primary interface)
- Continuously learns from user feedback (explicit + implicit signals)
- Creates new documents via Slack with AI drafting
- Enforces approval workflows based on document type
Status: All 18 phases completed and functional.
USER INTERFACE: SLACK
/ask command | @bot mentions | DM conversations
|
v
QUERY PLANNING
Query Decomposition | Source Selection | Multi-hop
|
v
RETRIEVAL
Hybrid Search (BM25+Vector) | Graph Traversal | Reranking
|
v
GENERATION
RAG Answer | Citations | LLM-as-Judge Evaluation
|
v
LEARNING
Explicit Feedback | Behavioral Signals | Gap Analysis
|
v
DATA LAYER
ChromaDB (vectors) | SQLite (metadata) | NetworkX (graph)
| Component | Technology |
|---|---|
| Language | Python 3.11+ |
| API Framework | FastAPI |
| Primary Interface | Slack Bot (Bolt) |
| Vector Database | ChromaDB (HTTP mode) |
| LLM Provider | Anthropic Claude (primary), Gemini (alternative) |
| Embeddings | sentence-transformers / Vertex AI |
| Keyword Search | rank-bm25 |
| Knowledge Graph | NetworkX |
| Metadata Storage | SQLite + SQLAlchemy |
| Task Queue | Celery + Redis |
| Re-ranking | cross-encoder (sentence-transformers) |
| Web UI | Streamlit |
ai-based-knowledge/
├── src/knowledge_base/ # Main application code
│ ├── api/ # REST API endpoints
│ ├── auth/ # Authentication & authorization
│ ├── chunking/ # Document parsing & chunking
│ ├── cli.py # CLI commands (kb command)
│ ├── config.py # Application settings
│ ├── confluence/ # Confluence sync client
│ ├── db/ # Database models (SQLAlchemy)
│ ├── documents/ # Document creation & approval
│ ├── evaluation/ # LLM-as-Judge quality scoring
│ ├── governance/ # Gap analysis, obsolete detection
│ ├── graph/ # Knowledge graph (NetworkX)
│ ├── lifecycle/ # Document lifecycle management
│ ├── main.py # FastAPI entry point
│ ├── metadata/ # AI metadata extraction
│ ├── rag/ # RAG pipeline & LLM providers
│ ├── search/ # Hybrid search (BM25 + vector)
│ ├── slack/ # Slack bot integration
│ ├── vectorstore/ # ChromaDB client & embeddings
│ └── web/ # Streamlit web UI
├── tests/ # Test suite
├── plan/ # Implementation planning docs
│ ├── MASTER_PLAN.md # High-level architecture & phases
│ ├── PROGRESS.md # Implementation progress tracker
│ └── phases/ # Detailed specs per phase
├── docs/ # Documentation
│ ├── adr/ # Architecture Decision Records
│ └── AGENT-REPORTS/ # Security & analysis reports
├── deploy/ # Deployment configurations
├── docker-compose.yml # Local development setup
├── Dockerfile # Container build
└── pyproject.toml # Python dependencies
- One-time initial sync from Confluence Cloud
- Manual rebase via CLI when refresh needed
- Preserves user feedback and quality scores across rebases
- BM25 keyword search for exact term matching
- Vector search for semantic similarity
- RRF (Reciprocal Rank Fusion) to combine results
- Knowledge graph traversal for related content
- Retrieves relevant chunks from hybrid search
- Generates answers using LLM (Claude/Gemini)
- Includes source citations in responses
- Explicit feedback: Thumbs up/down buttons in Slack
- Behavioral signals: Reactions, gratitude, frustration detection
- Quality scoring: Normalized scores boost search ranking
- Gap analysis for unanswered questions
- Obsolete content detection (2+ years old)
- Nightly LLM-as-Judge evaluation
- Create documents via Slack (
/create-docor "Save as Doc") - AI drafting assistance
- Approval workflows
# Required environment variables
SLACK_BOT_TOKEN=xoxb-xxx
SLACK_APP_TOKEN=xapp-xxx
SLACK_SIGNING_SECRET=xxx
CONFLUENCE_URL=https://your-org.atlassian.net
CONFLUENCE_API_TOKEN=xxx
CONFLUENCE_SPACE_KEYS=DOCS,ENG
ANTHROPIC_API_KEY=sk-ant-xxx# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -e .
# Run locally with Docker Compose
docker-compose up -d# Sync from Confluence
kb sync --space DOCS
# Run search
kb search "how to deploy"
# Generate metadata for all pages
kb metadata generate
# Build knowledge graph
kb graph build
# Start Slack bot
kb slack startConfluence Cloud --(initial sync)--> Knowledge Base
--(manual rebase)--> (when needed)
User Interactions (Slack) --(real-time)--> Enrichments
| |
v v
Feedback/Signals Quality Scores
| Data Type | Survives Rebase? | Notes |
|---|---|---|
| Content/chunks/vectors | Regenerated | Fresh from Confluence |
| Feedback | Yes | Linked by page_id |
| Quality Scores | Yes | Linked by page_id |
| Behavioral Signals | Yes | Linked by page_id |
Key decisions documented in docs/adr/:
| ADR | Decision | Rationale |
|---|---|---|
| ADR-0001 | DuckDB on GCE | Cost-effective, simple |
| ADR-0002 | ChromaDB on Cloud Run | Portable, no vendor lock-in |
| ADR-0003 | Anthropic Claude | Best quality for RAG |
| ADR-0004 | Slack Bot HTTP Mode | Cloud Run compatible |
All 18 phases completed:
| Phase | Name | Status |
|---|---|---|
| 01 | Infrastructure | Done |
| 02 | Confluence Download | Done |
| 03 | Content Parsing | Done |
| 04 | Metadata Generation | Done |
| 04.5 | Knowledge Graph | Done |
| 05 | Vector Indexing | Done |
| 05.5 | Hybrid Search | Done |
| 06 | Search API | Done |
| 07 | RAG Answers | Done |
| 08 | Slack Bot | Done |
| 09 | Permissions | Done |
| 10 | Feedback Collection | Done |
| 10.5 | Behavioral Signals | Done |
| 11 | Quality Scoring | Done |
| 11.5 | Nightly Evaluation | Done |
| 12 | Governance | Done |
| 13 | Web UI | Done |
| 14 | Document Creation | Done |
See plan/PROGRESS.md for detailed changelog.
To understand this project:
- Start with
plan/MASTER_PLAN.mdfor high-level architecture - Check
plan/PROGRESS.mdfor implementation status - Browse
plan/phases/for detailed specs of each component - See
docs/adr/for architectural decisions
Key source directories:
src/knowledge_base/rag/- RAG pipeline and LLM providerssrc/knowledge_base/search/- Hybrid search implementationsrc/knowledge_base/slack/- Slack bot integrationsrc/knowledge_base/vectorstore/- ChromaDB and embeddingssrc/knowledge_base/graph/- Knowledge graph
Configuration:
src/knowledge_base/config.py- All settings with env var overrides.env.example- Environment variable templatedocker-compose.yml- Local development services
Tests:
tests/- Pytest-based test suite- Run with:
pytest tests/
This codebase uses:
- Async/await for all I/O operations
- Pydantic for data validation and settings
- SQLAlchemy 2.0 async patterns for database
- Dependency injection via FastAPI
- Structured logging throughout
- Type hints everywhere (mypy strict mode)
Adding a new LLM provider:
- Create provider in
src/knowledge_base/rag/providers/ - Implement
BaseLLMProviderinterface - Register in
src/knowledge_base/rag/llm_factory.py
Adding a new search source:
- Implement retriever in
src/knowledge_base/search/ - Add to hybrid search fusion in
hybrid.py
Modifying Slack commands:
- Edit
src/knowledge_base/slack/bot.py - Add command handlers following existing patterns
See docs/AGENT-REPORTS/SECURITY.md for full security review.
Key considerations:
- All secrets via environment variables
- Slack signing secret verification
- Permission checks on all queries
- No hardcoded credentials
Proprietary - Keboola