371 lines
8.2 KiB
Markdown
371 lines
8.2 KiB
Markdown
# Diary RAG System - Documentation
|
|
|
|
Das Crumbforest Diary RAG System ist jetzt vollständig integriert! Kinder können Tagebuch-Einträge erstellen, die automatisch indexiert und durchsuchbar gemacht werden.
|
|
|
|
## 🎯 Was wurde implementiert?
|
|
|
|
### 1. Database Schema
|
|
**Neue Tabellen** (`compose/init/04_diary_schema.sql`):
|
|
- `children` - Kinder/Nutzer mit Access Tokens
|
|
- `diary_entries` - Tagebuch-Einträge
|
|
- `audit_log` - GDPR-konforme Audit-Logs
|
|
|
|
**Erweiterte Tabelle**:
|
|
- `post_vectors` - Neue Spalten: `post_type`, `child_id`
|
|
|
|
### 2. Pydantic Models
|
|
**Neue Models** (`app/models/rag_models.py`):
|
|
- `DiaryIndexRequest` / `DiaryIndexResponse`
|
|
- `DiarySearchRequest` / `DiarySearchResponse`
|
|
- `DiaryAskRequest` / `DiaryAskResponse`
|
|
- `DiarySearchResult`
|
|
|
|
### 3. API Endpoints
|
|
**Neuer Router** (`app/routers/diary_rag.py`):
|
|
- `POST /api/diary/index` - Indexiert Tagebuch-Eintrag
|
|
- `POST /api/diary/search` - Semantic Search im Tagebuch
|
|
- `POST /api/diary/ask` - RAG Query (Q&A)
|
|
- `GET /api/diary/{child_id}/status` - Indexing-Status
|
|
|
|
### 4. Integration Test
|
|
**Test-Suite** (`tests/test_integration.py`):
|
|
- Kompletter End-to-End Test
|
|
- PHP -> FastAPI -> Qdrant Flow
|
|
- Alle 6 Schritte validiert
|
|
|
|
## 📋 API Endpoints
|
|
|
|
### 1. Index Diary Entry
|
|
```bash
|
|
POST /api/diary/index
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"entry_id": 1,
|
|
"child_id": 1,
|
|
"content": "# Heute im Wald\n\nIch habe einen Igel gesehen!",
|
|
"provider": "openai"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"entry_id": 1,
|
|
"child_id": 1,
|
|
"chunks": 3,
|
|
"collection": "diary_child_1",
|
|
"provider": "openai"
|
|
}
|
|
```
|
|
|
|
### 2. Search Diary
|
|
```bash
|
|
POST /api/diary/search
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"child_id": 1,
|
|
"query": "Igel",
|
|
"provider": "openai",
|
|
"limit": 5
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"results": [
|
|
{
|
|
"entry_id": 1,
|
|
"content": "Ich habe einen Igel gesehen...",
|
|
"score": 0.95,
|
|
"created_at": "2025-01-15T10:30:00"
|
|
}
|
|
],
|
|
"query": "Igel",
|
|
"child_id": 1,
|
|
"provider": "openai"
|
|
}
|
|
```
|
|
|
|
### 3. RAG Query (Ask)
|
|
```bash
|
|
POST /api/diary/ask
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"child_id": 1,
|
|
"question": "Was habe ich im Wald gesehen?",
|
|
"provider": "openai",
|
|
"context_limit": 3
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"answer": "Du hast einen Igel im Wald gesehen. Du warst mit deinem Papa spazieren...",
|
|
"question": "Was habe ich im Wald gesehen?",
|
|
"child_id": 1,
|
|
"sources": [
|
|
{
|
|
"entry_id": 1,
|
|
"content": "Ich war heute mit Papa im Wald...",
|
|
"score": 0.95,
|
|
"created_at": "2025-01-15T10:30:00"
|
|
}
|
|
],
|
|
"provider": "openai",
|
|
"model": "gpt-4o-mini"
|
|
}
|
|
```
|
|
|
|
### 4. Get Status
|
|
```bash
|
|
GET /api/diary/1/status
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"child_id": 1,
|
|
"total_entries": 5,
|
|
"indexed_entries": 5,
|
|
"total_vectors": 15,
|
|
"last_indexed": "2025-01-15T10:30:00",
|
|
"collection_name": "diary_child_1"
|
|
}
|
|
```
|
|
|
|
## 🔧 Setup & Deployment
|
|
|
|
### 1. Environment Variables
|
|
Füge zu `.env` hinzu:
|
|
```bash
|
|
# AI Provider API Keys (mindestens einen)
|
|
OPENAI_API_KEY=sk-...
|
|
ANTHROPIC_API_KEY=sk-ant-...
|
|
OPENROUTER_API_KEY=sk-or-...
|
|
|
|
# Default Providers
|
|
DEFAULT_EMBEDDING_PROVIDER=openai
|
|
DEFAULT_COMPLETION_PROVIDER=openai
|
|
```
|
|
|
|
### 2. Database Migration
|
|
```bash
|
|
# Starte Docker Compose neu, um die neuen Tabellen zu erstellen
|
|
cd compose
|
|
docker compose down
|
|
docker compose up --build
|
|
|
|
# Alternativ: Führe das Schema-Update manuell aus
|
|
docker compose exec -T db sh -lc \
|
|
'mariadb -u"$MARIADB_USER" -p"$MARIADB_PASSWORD" "$MARIADB_DATABASE"' \
|
|
< compose/init/04_diary_schema.sql
|
|
```
|
|
|
|
### 3. Verify Installation
|
|
```bash
|
|
# Health Check
|
|
curl http://localhost:8000/health
|
|
|
|
# List all routes
|
|
curl http://localhost:8000/__routes | grep diary
|
|
|
|
# Check providers
|
|
curl http://localhost:8000/admin/rag/providers
|
|
```
|
|
|
|
## 🧪 Testing
|
|
|
|
### Run Integration Test
|
|
```bash
|
|
# Stelle sicher, dass Docker Compose läuft
|
|
cd compose && docker compose up -d
|
|
|
|
# Setze API Key (mindestens einen)
|
|
export OPENAI_API_KEY=sk-...
|
|
|
|
# Führe Test aus
|
|
python tests/test_integration.py
|
|
```
|
|
|
|
**Erwartete Ausgabe:**
|
|
```
|
|
============================================================
|
|
Crumbforest Integration Test
|
|
PHP <-> FastAPI <-> Qdrant
|
|
============================================================
|
|
✓ API is healthy
|
|
✓ Connected to database
|
|
✓ Cleaned up test data
|
|
|
|
=== Step 1: Create Child ===
|
|
✓ Created child with ID: 1
|
|
|
|
=== Step 2: Create Diary Entry ===
|
|
✓ Created diary entry with ID: 1
|
|
|
|
=== Step 3: Index Diary Entry ===
|
|
✓ Indexed successfully:
|
|
- Status: success
|
|
- Chunks: 3
|
|
- Collection: diary_child_1
|
|
- Provider: openai
|
|
|
|
=== Step 4: Search Diary ===
|
|
✓ Search successful:
|
|
- Query: Igel
|
|
- Results: 1
|
|
|
|
=== Step 5: RAG Query (Ask Question) ===
|
|
✓ RAG query successful:
|
|
- Answer: Du hast einen Igel im Wald gesehen...
|
|
|
|
=== Step 6: Check Indexing Status ===
|
|
✓ Status: All entries indexed
|
|
|
|
============================================================
|
|
✓ ALL TESTS PASSED!
|
|
============================================================
|
|
Wuuuuhuuu! 💚
|
|
```
|
|
|
|
## 🔗 PHP Integration
|
|
|
|
### PHP FastAPI Client
|
|
Nutze die existierende `class.fastapi.php`:
|
|
|
|
```php
|
|
<?php
|
|
require_once 'classes/class.fastapi.php';
|
|
|
|
$api = new FastAPIClient('http://fastapi:8000');
|
|
|
|
// Index diary entry
|
|
$response = $api->post('/api/diary/index', [
|
|
'entry_id' => $entry_id,
|
|
'child_id' => $child_id,
|
|
'content' => $diary_content,
|
|
'provider' => 'openai'
|
|
]);
|
|
|
|
if ($response['success']) {
|
|
echo "Indexed successfully!";
|
|
}
|
|
|
|
// Search diary
|
|
$results = $api->post('/api/diary/search', [
|
|
'child_id' => $child_id,
|
|
'query' => 'Igel',
|
|
'provider' => 'openai',
|
|
'limit' => 5
|
|
]);
|
|
|
|
// RAG query
|
|
$answer = $api->post('/api/diary/ask', [
|
|
'child_id' => $child_id,
|
|
'question' => 'Was habe ich im Wald gesehen?',
|
|
'provider' => 'openai'
|
|
]);
|
|
```
|
|
|
|
## 📊 Architecture
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ PHP (8080) │
|
|
│ │
|
|
│ - Children │
|
|
│ - Diary │
|
|
│ - Tokens │
|
|
└──────┬──────┘
|
|
│
|
|
│ HTTP POST
|
|
│
|
|
v
|
|
┌─────────────────┐ ┌──────────────┐
|
|
│ FastAPI (8000) │────► │ Qdrant:6333 │
|
|
│ │ │ │
|
|
│ - RAG Service │ │ Collections: │
|
|
│ - Embedding │ │ - diary_1 │
|
|
│ - Provider │ │ - diary_2 │
|
|
└─────────┬───────┘ │ - diary_N │
|
|
│ └──────────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ MariaDB:3306 │
|
|
│ │
|
|
│ - children │
|
|
│ - diary_entries │
|
|
│ - post_vectors │
|
|
│ - audit_log │
|
|
└─────────────────┘
|
|
```
|
|
|
|
## 🔐 Security & GDPR
|
|
|
|
### Audit Logging
|
|
Alle Aktionen werden in `audit_log` protokolliert:
|
|
- `diary_indexed` - Eintrag wurde indexiert
|
|
- `diary_searched` - Tagebuch wurde durchsucht
|
|
- `diary_rag_query` - RAG Query wurde ausgeführt
|
|
|
|
### Data Isolation
|
|
- Jedes Kind hat seine eigene Qdrant Collection: `diary_child_{id}`
|
|
- Keine Cross-Child Zugriffe möglich
|
|
- Token-basierter Zugriff über QR-Codes
|
|
|
|
### GDPR Compliance
|
|
- Immutable Audit Log (INSERT ONLY)
|
|
- Metadata als JSON für Flexibilität
|
|
- CASCADE DELETE auf Kind-Ebene
|
|
|
|
## 🎨 Providers
|
|
|
|
### Unterstützte Provider
|
|
1. **OpenAI** - text-embedding-3-small + gpt-4o-mini
|
|
2. **Claude** - Voyage AI embeddings + Claude Sonnet
|
|
3. **OpenRouter** - Flexible multi-provider
|
|
|
|
### Provider wechseln
|
|
```bash
|
|
# In allen Requests:
|
|
{
|
|
"provider": "claude" # oder "openai", "openrouter"
|
|
}
|
|
```
|
|
|
|
## 🚀 Next Steps
|
|
|
|
### 1. PHP Integration
|
|
- Erstelle `php/api/diary/create.php`
|
|
- Integriere FastAPI-Aufruf nach Diary Creation
|
|
- Füge Background-Worker für Batch-Indexing hinzu
|
|
|
|
### 2. UI Features
|
|
- Admin Dashboard für Diary Stats
|
|
- Kind-spezifische RAG Query Page
|
|
- Token-generierung für Kinder
|
|
|
|
### 3. Advanced Features
|
|
- Multi-lingual Diary Support
|
|
- Emotion Detection
|
|
- Automatic Tagging
|
|
- Export als PDF
|
|
|
|
## 📞 Support
|
|
|
|
Bei Fragen oder Problemen:
|
|
1. Check logs: `docker compose logs -f app`
|
|
2. Verify DB: `docker compose exec db mariadb -u crumb -p`
|
|
3. Test Qdrant: `curl http://localhost:6333/collections`
|
|
|
|
---
|
|
|
|
**Wuuuuhuuu! Das Diary RAG System ist fertig! 💚**
|