Initial commit: Crumbforest Architecture Refinement v1 (Clean)
This commit is contained in:
370
DIARY_RAG_README.md
Normal file
370
DIARY_RAG_README.md
Normal file
@@ -0,0 +1,370 @@
|
||||
# Diary RAG System - Documentation
|
||||
|
||||
Das Crumbforest Diary RAG System ist jetzt vollständig integriert! Kinder können Tagebuch-Einträge erstellen, die automatisch indexiert und durchsuchbar gemacht werden.
|
||||
|
||||
## 🎯 Was wurde implementiert?
|
||||
|
||||
### 1. Database Schema
|
||||
**Neue Tabellen** (`compose/init/04_diary_schema.sql`):
|
||||
- `children` - Kinder/Nutzer mit Access Tokens
|
||||
- `diary_entries` - Tagebuch-Einträge
|
||||
- `audit_log` - GDPR-konforme Audit-Logs
|
||||
|
||||
**Erweiterte Tabelle**:
|
||||
- `post_vectors` - Neue Spalten: `post_type`, `child_id`
|
||||
|
||||
### 2. Pydantic Models
|
||||
**Neue Models** (`app/models/rag_models.py`):
|
||||
- `DiaryIndexRequest` / `DiaryIndexResponse`
|
||||
- `DiarySearchRequest` / `DiarySearchResponse`
|
||||
- `DiaryAskRequest` / `DiaryAskResponse`
|
||||
- `DiarySearchResult`
|
||||
|
||||
### 3. API Endpoints
|
||||
**Neuer Router** (`app/routers/diary_rag.py`):
|
||||
- `POST /api/diary/index` - Indexiert Tagebuch-Eintrag
|
||||
- `POST /api/diary/search` - Semantic Search im Tagebuch
|
||||
- `POST /api/diary/ask` - RAG Query (Q&A)
|
||||
- `GET /api/diary/{child_id}/status` - Indexing-Status
|
||||
|
||||
### 4. Integration Test
|
||||
**Test-Suite** (`tests/test_integration.py`):
|
||||
- Kompletter End-to-End Test
|
||||
- PHP -> FastAPI -> Qdrant Flow
|
||||
- Alle 6 Schritte validiert
|
||||
|
||||
## 📋 API Endpoints
|
||||
|
||||
### 1. Index Diary Entry
|
||||
```bash
|
||||
POST /api/diary/index
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"entry_id": 1,
|
||||
"child_id": 1,
|
||||
"content": "# Heute im Wald\n\nIch habe einen Igel gesehen!",
|
||||
"provider": "openai"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"entry_id": 1,
|
||||
"child_id": 1,
|
||||
"chunks": 3,
|
||||
"collection": "diary_child_1",
|
||||
"provider": "openai"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Search Diary
|
||||
```bash
|
||||
POST /api/diary/search
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"child_id": 1,
|
||||
"query": "Igel",
|
||||
"provider": "openai",
|
||||
"limit": 5
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"entry_id": 1,
|
||||
"content": "Ich habe einen Igel gesehen...",
|
||||
"score": 0.95,
|
||||
"created_at": "2025-01-15T10:30:00"
|
||||
}
|
||||
],
|
||||
"query": "Igel",
|
||||
"child_id": 1,
|
||||
"provider": "openai"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. RAG Query (Ask)
|
||||
```bash
|
||||
POST /api/diary/ask
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"child_id": 1,
|
||||
"question": "Was habe ich im Wald gesehen?",
|
||||
"provider": "openai",
|
||||
"context_limit": 3
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"answer": "Du hast einen Igel im Wald gesehen. Du warst mit deinem Papa spazieren...",
|
||||
"question": "Was habe ich im Wald gesehen?",
|
||||
"child_id": 1,
|
||||
"sources": [
|
||||
{
|
||||
"entry_id": 1,
|
||||
"content": "Ich war heute mit Papa im Wald...",
|
||||
"score": 0.95,
|
||||
"created_at": "2025-01-15T10:30:00"
|
||||
}
|
||||
],
|
||||
"provider": "openai",
|
||||
"model": "gpt-4o-mini"
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Get Status
|
||||
```bash
|
||||
GET /api/diary/1/status
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"child_id": 1,
|
||||
"total_entries": 5,
|
||||
"indexed_entries": 5,
|
||||
"total_vectors": 15,
|
||||
"last_indexed": "2025-01-15T10:30:00",
|
||||
"collection_name": "diary_child_1"
|
||||
}
|
||||
```
|
||||
|
||||
## 🔧 Setup & Deployment
|
||||
|
||||
### 1. Environment Variables
|
||||
Füge zu `.env` hinzu:
|
||||
```bash
|
||||
# AI Provider API Keys (mindestens einen)
|
||||
OPENAI_API_KEY=sk-...
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
OPENROUTER_API_KEY=sk-or-...
|
||||
|
||||
# Default Providers
|
||||
DEFAULT_EMBEDDING_PROVIDER=openai
|
||||
DEFAULT_COMPLETION_PROVIDER=openai
|
||||
```
|
||||
|
||||
### 2. Database Migration
|
||||
```bash
|
||||
# Starte Docker Compose neu, um die neuen Tabellen zu erstellen
|
||||
cd compose
|
||||
docker compose down
|
||||
docker compose up --build
|
||||
|
||||
# Alternativ: Führe das Schema-Update manuell aus
|
||||
docker compose exec -T db sh -lc \
|
||||
'mariadb -u"$MARIADB_USER" -p"$MARIADB_PASSWORD" "$MARIADB_DATABASE"' \
|
||||
< compose/init/04_diary_schema.sql
|
||||
```
|
||||
|
||||
### 3. Verify Installation
|
||||
```bash
|
||||
# Health Check
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# List all routes
|
||||
curl http://localhost:8000/__routes | grep diary
|
||||
|
||||
# Check providers
|
||||
curl http://localhost:8000/admin/rag/providers
|
||||
```
|
||||
|
||||
## 🧪 Testing
|
||||
|
||||
### Run Integration Test
|
||||
```bash
|
||||
# Stelle sicher, dass Docker Compose läuft
|
||||
cd compose && docker compose up -d
|
||||
|
||||
# Setze API Key (mindestens einen)
|
||||
export OPENAI_API_KEY=sk-...
|
||||
|
||||
# Führe Test aus
|
||||
python tests/test_integration.py
|
||||
```
|
||||
|
||||
**Erwartete Ausgabe:**
|
||||
```
|
||||
============================================================
|
||||
Crumbforest Integration Test
|
||||
PHP <-> FastAPI <-> Qdrant
|
||||
============================================================
|
||||
✓ API is healthy
|
||||
✓ Connected to database
|
||||
✓ Cleaned up test data
|
||||
|
||||
=== Step 1: Create Child ===
|
||||
✓ Created child with ID: 1
|
||||
|
||||
=== Step 2: Create Diary Entry ===
|
||||
✓ Created diary entry with ID: 1
|
||||
|
||||
=== Step 3: Index Diary Entry ===
|
||||
✓ Indexed successfully:
|
||||
- Status: success
|
||||
- Chunks: 3
|
||||
- Collection: diary_child_1
|
||||
- Provider: openai
|
||||
|
||||
=== Step 4: Search Diary ===
|
||||
✓ Search successful:
|
||||
- Query: Igel
|
||||
- Results: 1
|
||||
|
||||
=== Step 5: RAG Query (Ask Question) ===
|
||||
✓ RAG query successful:
|
||||
- Answer: Du hast einen Igel im Wald gesehen...
|
||||
|
||||
=== Step 6: Check Indexing Status ===
|
||||
✓ Status: All entries indexed
|
||||
|
||||
============================================================
|
||||
✓ ALL TESTS PASSED!
|
||||
============================================================
|
||||
Wuuuuhuuu! 💚
|
||||
```
|
||||
|
||||
## 🔗 PHP Integration
|
||||
|
||||
### PHP FastAPI Client
|
||||
Nutze die existierende `class.fastapi.php`:
|
||||
|
||||
```php
|
||||
<?php
|
||||
require_once 'classes/class.fastapi.php';
|
||||
|
||||
$api = new FastAPIClient('http://fastapi:8000');
|
||||
|
||||
// Index diary entry
|
||||
$response = $api->post('/api/diary/index', [
|
||||
'entry_id' => $entry_id,
|
||||
'child_id' => $child_id,
|
||||
'content' => $diary_content,
|
||||
'provider' => 'openai'
|
||||
]);
|
||||
|
||||
if ($response['success']) {
|
||||
echo "Indexed successfully!";
|
||||
}
|
||||
|
||||
// Search diary
|
||||
$results = $api->post('/api/diary/search', [
|
||||
'child_id' => $child_id,
|
||||
'query' => 'Igel',
|
||||
'provider' => 'openai',
|
||||
'limit' => 5
|
||||
]);
|
||||
|
||||
// RAG query
|
||||
$answer = $api->post('/api/diary/ask', [
|
||||
'child_id' => $child_id,
|
||||
'question' => 'Was habe ich im Wald gesehen?',
|
||||
'provider' => 'openai'
|
||||
]);
|
||||
```
|
||||
|
||||
## 📊 Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ PHP (8080) │
|
||||
│ │
|
||||
│ - Children │
|
||||
│ - Diary │
|
||||
│ - Tokens │
|
||||
└──────┬──────┘
|
||||
│
|
||||
│ HTTP POST
|
||||
│
|
||||
v
|
||||
┌─────────────────┐ ┌──────────────┐
|
||||
│ FastAPI (8000) │────► │ Qdrant:6333 │
|
||||
│ │ │ │
|
||||
│ - RAG Service │ │ Collections: │
|
||||
│ - Embedding │ │ - diary_1 │
|
||||
│ - Provider │ │ - diary_2 │
|
||||
└─────────┬───────┘ │ - diary_N │
|
||||
│ └──────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────┐
|
||||
│ MariaDB:3306 │
|
||||
│ │
|
||||
│ - children │
|
||||
│ - diary_entries │
|
||||
│ - post_vectors │
|
||||
│ - audit_log │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## 🔐 Security & GDPR
|
||||
|
||||
### Audit Logging
|
||||
Alle Aktionen werden in `audit_log` protokolliert:
|
||||
- `diary_indexed` - Eintrag wurde indexiert
|
||||
- `diary_searched` - Tagebuch wurde durchsucht
|
||||
- `diary_rag_query` - RAG Query wurde ausgeführt
|
||||
|
||||
### Data Isolation
|
||||
- Jedes Kind hat seine eigene Qdrant Collection: `diary_child_{id}`
|
||||
- Keine Cross-Child Zugriffe möglich
|
||||
- Token-basierter Zugriff über QR-Codes
|
||||
|
||||
### GDPR Compliance
|
||||
- Immutable Audit Log (INSERT ONLY)
|
||||
- Metadata als JSON für Flexibilität
|
||||
- CASCADE DELETE auf Kind-Ebene
|
||||
|
||||
## 🎨 Providers
|
||||
|
||||
### Unterstützte Provider
|
||||
1. **OpenAI** - text-embedding-3-small + gpt-4o-mini
|
||||
2. **Claude** - Voyage AI embeddings + Claude Sonnet
|
||||
3. **OpenRouter** - Flexible multi-provider
|
||||
|
||||
### Provider wechseln
|
||||
```bash
|
||||
# In allen Requests:
|
||||
{
|
||||
"provider": "claude" # oder "openai", "openrouter"
|
||||
}
|
||||
```
|
||||
|
||||
## 🚀 Next Steps
|
||||
|
||||
### 1. PHP Integration
|
||||
- Erstelle `php/api/diary/create.php`
|
||||
- Integriere FastAPI-Aufruf nach Diary Creation
|
||||
- Füge Background-Worker für Batch-Indexing hinzu
|
||||
|
||||
### 2. UI Features
|
||||
- Admin Dashboard für Diary Stats
|
||||
- Kind-spezifische RAG Query Page
|
||||
- Token-generierung für Kinder
|
||||
|
||||
### 3. Advanced Features
|
||||
- Multi-lingual Diary Support
|
||||
- Emotion Detection
|
||||
- Automatic Tagging
|
||||
- Export als PDF
|
||||
|
||||
## 📞 Support
|
||||
|
||||
Bei Fragen oder Problemen:
|
||||
1. Check logs: `docker compose logs -f app`
|
||||
2. Verify DB: `docker compose exec db mariadb -u crumb -p`
|
||||
3. Test Qdrant: `curl http://localhost:6333/collections`
|
||||
|
||||
---
|
||||
|
||||
**Wuuuuhuuu! Das Diary RAG System ist fertig! 💚**
|
||||
Reference in New Issue
Block a user