Files
Crumb-Core-v.1/HANDBUCH.md

1493 lines
36 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🦉 Crumbforest Handbuch
**Das vollständige Betriebs- und Wartungshandbuch für das Crumbforest CRM & RAG System**
---
## 📖 Inhaltsverzeichnis
1. [Philosophie & Vision](#philosophie--vision)
2. [System-Architektur](#system-architektur)
3. [Installation & Setup](#installation--setup)
4. [Test-Protokolle](#test-protokolle)
5. [Allgemeine Fehler & Lösungen](#allgemeine-fehler--lösungen)
6. [DSGVO & Datenschutz](#dsgvo--datenschutz)
7. [Wartung & Monitoring](#wartung--monitoring)
8. [Backup & Recovery](#backup--recovery)
9. [Performance-Tuning](#performance-tuning)
10. [Entwickler-Guide](#entwickler-guide)
---
## 🌲 Philosophie & Vision
### Liebe zum Wald & Atmen
Crumbforest ist mehr als Software es ist eine Philosophie:
**🌿 Der Wald als Metapher:**
- Jedes Kind ist ein Baum mit eigenen Wurzeln
- Wissen wächst organisch wie ein Ökosystem
- Fehler sind Humus für neues Lernen
- Gemeinsam bilden wir einen Wald
**💚 Atmen & Achtsamkeit:**
- Code atmet er lebt, wächst, verändert sich
- Pausen sind wichtig (wie die Lücken zwischen Bäumen)
- DSGVO ist nicht Bürokratie, sondern Respekt vor dem Raum jedes Menschen
- Slow Tech statt Fast Tech
**🦉 Die Eule wacht:**
- Wir beobachten, aber bewerten nicht
- Logs sind Geschichten, keine Anklagen
- Fehler sind Lernmomente
- "Wuuuuhuuu!" ist unsere Hymne der Freude
### Technische Prinzipien
1. **Transparenz vor Perfektion** - Lieber sichtbare Fehler als versteckte Bugs
2. **DSGVO als Feature** - Datenschutz ist eingebaut, nicht aufgesetzt
3. **Kind-zentriert** - Jedes Kind hat eigenen sicheren Raum
4. **Wald-Denken** - Alles ist verbunden, nichts steht allein
---
## <20> Die 15 Crew-Rollen
Crumbforest wird von einer Crew aus 15 spezialisierten KI-Agenten betrieben. Jeder hat eine eigene Persönlichkeit, Expertise und Zugriffsebene.
**Sprach-Support:** Alle Rollen sprechen **Deutsch (DE)**, **Englisch (EN)** und **Französisch (FR)** (wählbar im Login/Header).
### Öffentlicher Bereich (Home)
| Rolle | Icon | Funktion | Zugriff |
|-------|------|----------|---------|
| **Professor Eule** | 🦉 | System Architect & Guide | Öffentlich |
### Interne Crew (Login Required)
| Rolle | Icon | Expertise | Besonderheit |
|-------|------|-----------|--------------|
| **FunkFox** | 🦊 | Hip Hop MC & Motivation | Rappt Antworten ("Yo!") |
| **Schraubaer** | 🔧 | Master Mechanic | Handwerk & Konstruktion |
| **TaichiTaube** | 🕊️ | Security Sensei | Balance & Sicherheit |
| **DeepBit** | 🐙 | Low-Level Octopus | Assembler & Binary |
| **SnakePy** | 🐍 | Python Expert | Geduldige Lehrmeisterin |
| **PepperPHP** | 🌶️ | PHP Specialist | Web-Backend & Frameworks |
| **Templatus** | 📄 | Template Master | HTML & Jinja2 |
| **Schnippsi** | 🐿️ | UI/CSS Fee | Design & Farben (Cupcakes!) |
| **CloudCat** | ☁️ | DevOps | Docker & K8s |
| **GitBadger** | 🦡 | Version Control | Git & History |
| **Bugsy** | 🐞 | QA Analyst | Testing & Debugging |
| **DumboSQL** | 🐘 | Database Guide | SQL für Anfänger |
| **CapaciTobi** | ⚡ | Electronics | Hardware & Physik |
| **Schnecki** | 🐌 | Slow Tech | Achtsamkeit & Nachhaltigkeit |
---
## <20>🏗 System-Architektur
### Komponenten-Übersicht
```
┌─────────────────────────────────────────────────────────┐
│ 🌐 Client (Browser) │
└───────────────────────┬─────────────────────────────────┘
┌───────────────┴───────────────┐
│ │
┌───────▼────────┐ ┌──────▼───────┐
│ FastAPI │ │ Static │
│ (Python) │◄─────────────┤ Assets │
│ (Jinja2) │ │ (CSS/JS) │
└───────┬────────┘ └──────────────┘
├──────────┬──────────┬──────────┐
│ │ │ │
┌───────▼────┐ ┌──▼─────┐ ┌──▼─────┐ ┌─▼──────┐
│ MariaDB │ │ Qdrant │ │OpenRouter│ │ Config │
│ (SQL) │ │(Vector)│ │ API │ │ (JSON) │
└────────────┘ └────────┘ └────────┘ └────────┘
```
### Datenfluss
```
1. Markdown-Datei → DocumentIndexer (Python Watchdog)
2. DocumentIndexer → Chunking (Token-basiert)
3. Chunks → EmbeddingService → OpenRouter API
4. Embeddings → Qdrant (Vector DB)
5. Metadata → Local Config & Cache
```
### DSGVO-Architektur
```
Kind 1 → diary_child_1 (Qdrant) ─┐
Kind 2 → diary_child_2 (Qdrant) ─┼─→ Getrennte Collections
Kind N → diary_child_N (Qdrant) ─┘ (Isolation)
audit_log (MariaDB)
(Wer, Was, Wann - Unveränderbar)
```
---
## 🚀 Installation & Setup
### Voraussetzungen prüfen
```bash
# System-Check ausführen
cat > /tmp/system_check.sh << 'EOF'
#!/bin/bash
echo "🔍 Crumbforest System-Check"
echo "=========================="
# Docker
if command -v docker &> /dev/null; then
echo "✅ Docker: $(docker --version)"
else
echo "❌ Docker nicht gefunden!"
exit 1
fi
# Docker Compose
if command -v docker compose &> /dev/null; then
echo "✅ Docker Compose: $(docker compose version)"
else
echo "❌ Docker Compose nicht gefunden!"
exit 1
fi
# Python (optional)
if command -v python3 &> /dev/null; then
echo "✅ Python: $(python3 --version)"
else
echo "⚠️ Python nicht gefunden (optional)"
fi
# Freier Speicher
free_space=$(df -h . | awk 'NR==2 {print $4}')
echo "💾 Freier Speicher: $free_space"
# Ports prüfen
if lsof -i :8000 &> /dev/null; then
echo "⚠️ Port 8000 bereits belegt!"
lsof -i :8000
else
echo "✅ Port 8000 frei"
fi
if lsof -i :6333 &> /dev/null; then
echo "⚠️ Port 6333 bereits belegt!"
else
echo "✅ Port 6333 frei"
fi
echo ""
echo "System-Check abgeschlossen!"
EOF
chmod +x /tmp/system_check.sh
/tmp/system_check.sh
```
### Minimale Installation
```bash
# 1. API Keys konfigurieren
cd compose
cp .env.example .env # Falls vorhanden
nano .env
# Minimum: Ein Provider
OPENROUTER_API_KEY=sk-or-v1-...
# 2. Starten
docker compose up -d
# 3. Warten auf Services
echo "Warte auf Database..."
until docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "SELECT 1"' &> /dev/null; do
sleep 2
done
echo "✅ Database bereit"
echo "Warte auf App..."
until curl -s http://localhost:8000/health > /dev/null; do
sleep 2
done
echo "✅ App bereit"
# 4. Status prüfen
docker compose ps
```
---
## 🧪 Test-Protokolle
### Level 1: Bash/System Tests
```bash
#!/bin/bash
# test_level1_system.sh
echo "🧪 Level 1: System Tests"
echo "======================="
# Test 1: Health Check
echo -n "Test 1.1 - Health Endpoint... "
if curl -s http://localhost:8000/health | grep -q '"ok":true'; then
echo "✅"
else
echo "❌ FAILED"
exit 1
fi
# Test 2: Database Connection
echo -n "Test 1.2 - Database Connection... "
if docker compose -f compose/docker-compose.yml exec -T db sh -c \
'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "SELECT 1"' &> /dev/null; then
echo "✅"
else
echo "❌ FAILED"
exit 1
fi
# Test 3: Qdrant
echo -n "Test 1.3 - Qdrant Connection... "
if curl -s http://localhost:6333/collections | grep -q '"ok"'; then
echo "✅"
else
echo "❌ FAILED"
exit 1
fi
# Test 4: File System
echo -n "Test 1.4 - Document Directories... "
if [ -d "docs/rz-nullfeld" ] && [ -d "docs/crumbforest" ]; then
echo "✅"
else
echo "❌ FAILED"
exit 1
fi
# Test 5: Docker Logs (keine Errors)
echo -n "Test 1.5 - Docker Logs Check... "
error_count=$(docker compose -f compose/docker-compose.yml logs app | grep -i error | grep -v "ERROR_FOR_DIVISION_BY_ZERO" | wc -l)
if [ "$error_count" -lt 5 ]; then
echo "✅ ($error_count Errors)"
else
echo "⚠️ ($error_count Errors - prüfe Logs)"
fi
echo ""
echo "Level 1 Tests abgeschlossen!"
```
### Level 2: cURL/API Tests
```bash
#!/bin/bash
# test_level2_api.sh
echo "🧪 Level 2: API Tests"
echo "===================="
BASE_URL="http://localhost:8000"
# Test 2.1: Routes auflisten
echo -n "Test 2.1 - List Routes... "
route_count=$(curl -s $BASE_URL/__routes | jq 'length')
if [ "$route_count" -gt 10 ]; then
echo "✅ ($route_count routes)"
else
echo "❌ FAILED"
exit 1
fi
# Test 2.2: Provider Status
echo -n "Test 2.2 - Provider Status... "
if curl -s "$BASE_URL/admin/rag/providers/status" | grep -q 'available'; then
echo "✅"
else
echo "❌ FAILED"
exit 1
fi
# Test 2.3: Document Collections
echo -n "Test 2.3 - Qdrant Collections... "
collections=$(curl -s http://localhost:6333/collections | jq '.result.collections | length')
if [ "$collections" -ge 2 ]; then
echo "✅ ($collections collections)"
else
echo "⚠️ ($collections collections - sollten mindestens 2 sein)"
fi
# Test 2.4: Document Search (ohne Auth)
echo -n "Test 2.4 - Document Search API... "
# Dieser Test erwartet 401, da Auth erforderlich
status=$(curl -s -o /dev/null -w "%{http_code}" "$BASE_URL/api/documents/search?q=test")
if [ "$status" = "401" ]; then
echo "✅ (Auth erforderlich - korrekt)"
else
echo "⚠️ Status: $status"
fi
# Test 2.5: API Hello
echo -n "Test 2.5 - API Hello... "
if curl -s "$BASE_URL/api/hello?lang=de" | grep -q 'Hallo Welt'; then
echo "✅"
else
echo "❌ FAILED"
exit 1
fi
echo ""
echo "Level 2 Tests abgeschlossen!"
```
### Level 3: Python Integration Tests
```python
#!/usr/bin/env python3
# test_level3_integration.py
"""
Level 3: Python Integration Tests
Testet die Python-Komponenten direkt
"""
import sys
import os
sys.path.insert(0, 'app')
def test_config():
"""Test 3.1: Config laden"""
print("Test 3.1 - Config Loading... ", end="")
try:
from config import get_settings
settings = get_settings()
assert settings.mariadb_database == "crumbcrm"
print("✅")
return True
except Exception as e:
print(f"❌ FAILED: {e}")
return False
def test_provider_factory():
"""Test 3.2: Provider Factory"""
print("Test 3.2 - Provider Factory... ", end="")
try:
from services.provider_factory import ProviderFactory
from config import get_settings
settings = get_settings()
available = ProviderFactory.get_available_providers(settings)
assert len(available) > 0, "Keine Provider verfügbar"
print(f"✅ ({len(available)} providers)")
return True
except Exception as e:
print(f"❌ FAILED: {e}")
return False
def test_embedding_service():
"""Test 3.3: Embedding Service"""
print("Test 3.3 - Embedding Service... ", end="")
try:
from services.embedding_service import EmbeddingService
from services.provider_factory import ProviderFactory
from config import get_settings
settings = get_settings()
providers = ProviderFactory.get_available_providers(settings)
if not providers:
print("⚠️ SKIP (keine Provider)")
return True
provider = ProviderFactory.create_provider(providers[0], settings)
service = EmbeddingService(provider)
# Test chunking
chunks = service.chunk_text("Test" * 500, chunk_size=100, overlap=20)
assert len(chunks) > 1, "Chunking failed"
print(f"✅ ({len(chunks)} chunks)")
return True
except Exception as e:
print(f"❌ FAILED: {e}")
return False
def test_database_schema():
"""Test 3.4: Database Schema"""
print("Test 3.4 - Database Schema... ", end="")
try:
from deps import get_db
from pymysql.cursors import DictCursor
conn = get_db()
with conn.cursor(DictCursor) as cur:
# Prüfe wichtige Tabellen
tables = ['users', 'posts', 'post_vectors', 'audit_log']
cur.execute("SHOW TABLES")
existing = [row['Tables_in_crumbcrm'] for row in cur.fetchall()]
for table in tables:
assert table in existing, f"Tabelle {table} fehlt"
conn.close()
print(f"✅ ({len(tables)} tables)")
return True
except Exception as e:
print(f"❌ FAILED: {e}")
return False
def main():
print("🧪 Level 3: Python Integration Tests")
print("====================================")
tests = [
test_config,
test_provider_factory,
test_embedding_service,
test_database_schema,
]
passed = sum(1 for test in tests if test())
total = len(tests)
print()
print(f"Level 3 Tests: {passed}/{total} bestanden")
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())
```
### Level 4: PHP Integration Tests
```php
<?php
// test_level4_php.php
/**
* Level 4: PHP Integration Tests
* Testet die PHP-Komponenten und FastAPI Integration
*/
echo "🧪 Level 4: PHP Integration Tests\n";
echo "==================================\n";
$base_url = "http://localhost:8000";
$tests_passed = 0;
$tests_total = 0;
function test_api_call($name, $url, $expected_status = 200) {
global $tests_passed, $tests_total;
$tests_total++;
echo "Test 4.$tests_total - $name... ";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$response = curl_exec($ch);
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($status == $expected_status) {
echo "✅\n";
$tests_passed++;
return true;
} else {
echo "❌ FAILED (Status: $status)\n";
return false;
}
}
// Test 4.1: Health Check
test_api_call("Health Check", "$base_url/health");
// Test 4.2: API Hello
test_api_call("API Hello", "$base_url/api/hello?lang=de");
// Test 4.3: Routes List
test_api_call("Routes List", "$base_url/__routes");
// Test 4.4: Protected Endpoint (sollte 401 sein)
test_api_call("Protected Endpoint", "$base_url/admin/rag/providers", 401);
// Test 4.5: JSON Response Format
echo "Test 4.5 - JSON Response Format... ";
$tests_total++;
$ch = curl_init("$base_url/api/hello");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
$json = json_decode($response, true);
if ($json && isset($json['message'])) {
echo "✅\n";
$tests_passed++;
} else {
echo "❌ FAILED\n";
}
echo "\n";
echo "Level 4 Tests: $tests_passed/$tests_total bestanden\n";
exit($tests_passed == $tests_total ? 0 : 1);
?>
```
### Test-Suite ausführen
```bash
#!/bin/bash
# run_all_tests.sh
echo "🦉 Crumbforest Test-Suite"
echo "========================="
echo ""
# Level 1: System
bash test_level1_system.sh
level1=$?
# Level 2: API
bash test_level2_api.sh
level2=$?
# Level 3: Python
python3 test_level3_integration.py
level3=$?
# Level 4: PHP (falls vorhanden)
if command -v php &> /dev/null; then
php test_level4_php.php
level4=$?
else
echo "⚠️ PHP nicht installiert - Level 4 übersprungen"
level4=0
fi
# Zusammenfassung
echo ""
echo "📊 Test-Zusammenfassung"
echo "======================="
echo "Level 1 (System): $([ $level1 -eq 0 ] && echo '✅ PASS' || echo '❌ FAIL')"
echo "Level 2 (API): $([ $level2 -eq 0 ] && echo '✅ PASS' || echo '❌ FAIL')"
echo "Level 3 (Python): $([ $level3 -eq 0 ] && echo '✅ PASS' || echo '❌ FAIL')"
echo "Level 4 (PHP): $([ $level4 -eq 0 ] && echo '✅ PASS' || echo '❌ FAIL')"
# Exit Code
if [ $level1 -eq 0 ] && [ $level2 -eq 0 ] && [ $level3 -eq 0 ] && [ $level4 -eq 0 ]; then
echo ""
echo "🎉 Alle Tests bestanden! Wuuuuhuuu!"
exit 0
else
echo ""
echo "⚠️ Einige Tests fehlgeschlagen. Prüfe die Logs."
exit 1
fi
```
---
## 🐛 Allgemeine Fehler & Lösungen
### Fehler 1: Port bereits belegt
**Symptom:**
```
Error: bind: address already in use
```
**Diagnose:**
```bash
# Prüfe welcher Prozess Port 8000 nutzt
lsof -i :8000
# oder
netstat -an | grep 8000
```
**Lösung 1 - Prozess beenden:**
```bash
# Finde PID
lsof -i :8000 | grep LISTEN | awk '{print $2}'
# Beende Prozess
kill -9 <PID>
```
**Lösung 2 - Port ändern:**
```bash
# In docker-compose.yml
ports:
- "8001:8000" # Nutze 8001 statt 8000
```
### Fehler 2: Database Connection Failed
**Symptom:**
```
sqlalchemy.exc.OperationalError: (2003, "Can't connect to MySQL server")
```
**Diagnose:**
```bash
# Prüfe DB Container Status
docker compose ps db
# Prüfe DB Logs
docker compose logs db | tail -50
# Teste DB Verbindung
docker compose exec -T db sh -c \
'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD -e "SELECT 1"'
```
**Lösung 1 - Warte länger:**
```bash
# DB braucht Zeit zum Starten (15-30 Sekunden)
sleep 30
# Health Check bis DB bereit
until docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "SELECT 1"' &> /dev/null; do
echo "Warte auf DB..."
sleep 2
done
```
**Lösung 2 - DB neu initialisieren:**
```bash
# ACHTUNG: Löscht ALLE Daten!
docker compose down -v
docker compose up -d
```
### Fehler 3: API Key Invalid
**Symptom:**
```
Error code: 401 - Incorrect API key provided
```
**Diagnose:**
```bash
# Prüfe Environment Variables im Container
docker compose exec app env | grep API_KEY
# Prüfe .env Datei
cat compose/.env | grep API_KEY
```
**Lösung:**
```bash
# 1. Korrekten Key in .env eintragen
nano compose/.env
# 2. Container neu starten (um .env zu laden)
docker compose restart app
# 3. Verifiziere mit Provider Status
curl http://localhost:8000/admin/rag/providers/status
```
### Fehler 4: Indexing schlägt fehl
**Symptom:**
```
Error indexing docs/...: OpenRouter embedding API call failed
```
**Diagnose:**
```bash
# Logs prüfen
docker compose logs app | grep -i error
# Provider Status prüfen
curl http://localhost:8000/admin/rag/providers/status | jq
# Qdrant Collections prüfen
curl http://localhost:6333/collections | jq
```
**Lösung 1 - API Credits prüfen:**
```bash
# OpenRouter Dashboard: https://openrouter.ai/credits
# Stelle sicher, dass Guthaben vorhanden ist
```
**Lösung 2 - Manuell re-indexieren:**
```bash
curl -X POST "http://localhost:8000/api/documents/index" \
-H "Content-Type: application/json" \
-d '{"provider": "openrouter", "force": true}'
```
**Lösung 3 - Anderen Provider nutzen:**
```bash
# In compose/.env
DEFAULT_EMBEDDING_PROVIDER=openai # oder claude
# Container neu starten
docker compose restart app
```
### Fehler 5: Qdrant Point ID Error
**Symptom:**
```
value 123456_0 is not a valid point ID
```
**Diagnose:**
```bash
# Prüfe Qdrant Version
curl http://localhost:6333/collections | jq
# Prüfe rag_service.py
grep "point_id =" app/services/rag_service.py
```
**Lösung:**
```bash
# UUID-basierte Point IDs sind implementiert
# Falls Fehler auftritt: Container neu bauen
docker compose up --build -d
```
### Fehler 6: Dateinamen mit Sonderzeichen
**Symptom:**
```
UnicodeEncodeError: 'ascii' codec can't encode character
```
**Diagnose:**
```bash
# Prüfe Dateinamen
find docs/ -name "*[^a-zA-Z0-9._-]*" -type f
# Prüfe UTF-8 Encoding
file docs/crumbforest/*.md | grep -v UTF-8
```
**Lösung:**
```bash
# System unterstützt UTF-8, aber falls Probleme:
# Dateinamen säubern (optional)
for file in docs/crumbforest/*; do
newname=$(echo "$file" | iconv -f UTF-8 -t ASCII//TRANSLIT)
if [ "$file" != "$newname" ]; then
mv "$file" "$newname"
fi
done
```
### Fehler 7: Docker Out of Memory
**Symptom:**
```
docker: Error response from daemon: OCI runtime create failed
```
**Diagnose:**
```bash
# Docker Memory Limit prüfen
docker info | grep -i memory
# Container Resources
docker stats
```
**Lösung:**
```bash
# In docker-compose.yml Memory Limits erhöhen
services:
app:
deploy:
resources:
limits:
memory: 2G
reservations:
memory: 1G
```
### Fehler 8: Session nicht persistent
**Symptom:**
```
Login klappt, aber bei Reload ausgeloggt
```
**Diagnose:**
```bash
# Prüfe SECRET_KEY
docker compose exec app env | grep SECRET_KEY
# Prüfe Session Middleware
grep SessionMiddleware app/main.py
```
**Lösung:**
```bash
# Festen SECRET_KEY in .env setzen
echo "SECRET_KEY=$(openssl rand -hex 32)" >> compose/.env
# Container neu starten
docker compose restart app
```
### Fehler 9: "Unsupported embedding model" trotz richtiger Config
**Symptom:**
```
{"detail":"Unsupported embedding model: openai/text-embedding-3-small.
Supported models: ['text-embedding-3-small', 'text-embedding-3-large', 'text-embedding-ada-002']"}
```
**Ursachen:**
1. **Falscher Provider in .env** - `DEFAULT_EMBEDDING_PROVIDER=openai` statt `openrouter`
2. **Code-Änderungen nicht aktiv** - Kein Volume Mount, Image-Rebuild nötig!
3. **Collection-Namen falsch** - Mit/ohne trailing underscore
**Diagnose:**
```bash
# 1. Prüfe .env Provider-Einstellungen
grep "DEFAULT_EMBEDDING_PROVIDER" compose/.env
# 2. Prüfe ob Code im Container aktuell ist
docker compose exec app grep -n "DEBUG" /app/routers/document_rag.py
# 3. Prüfe File-Timestamp im Container
docker compose exec app ls -la /app/routers/document_rag.py
# 4. Vergleiche mit Host-Datei
ls -la app/routers/document_rag.py
```
**Lösung:**
```bash
# 1. Korrigiere .env (KRITISCH!)
nano compose/.env
# Setze:
DEFAULT_EMBEDDING_PROVIDER=openrouter
DEFAULT_EMBEDDING_MODEL=text-embedding-3-small
DEFAULT_COMPLETION_PROVIDER=openrouter
DEFAULT_COMPLETION_MODEL=anthropic/claude-3-5-sonnet
# 2. Rebuild Image (KEINE Volume Mounts!)
cd compose
docker compose down
docker compose up --build -d
# 3. Verifiziere
docker compose logs app | grep "provider"
# Erwartete Ausgabe: "✓ Using provider: openrouter"
# 4. Teste Search
curl "http://localhost:8000/api/documents/search?q=Docker&limit=3"
```
**Wichtige Erkenntnis:**
```
⚠️ Es gibt KEIN Volume Mount für App-Code!
Der Code ist im Docker Image eingebacken.
JEDE Code-Änderung erfordert:
docker compose up --build -d
NUR .env Änderungen:
docker compose restart app
```
---
## 🔒 DSGVO & Datenschutz
### Compliance-Checkliste
#### ✅ Implementiert
- [x] **Datentrennung**: Jedes Kind hat eigene Qdrant-Collection
- [x] **Audit-Logging**: Unveränderbare Logs in `audit_log` Tabelle
- [x] **File-Hash-Tracking**: Keine Duplikate, Change Detection
- [x] **Role-Based Access**: Admin/User Trennung
- [x] **Session-Based Auth**: Keine Token-Leaks
- [x] **Metadata-Tracking**: Wer, Was, Wann in JSON
#### 📋 Datenfluss-Dokumentation
```
┌─────────────────────────────────────────────────────┐
│ DSGVO Datenfluss-Diagramm │
└─────────────────────────────────────────────────────┘
1. Kind sendet Tagebuch-Eintrag
2. PHP Backend → FastAPI (REST)
3. FastAPI validiert Token
4. Content → Chunking (max 1000 Zeichen)
5. Chunks → Embedding API (OpenRouter/OpenAI)
6. Embeddings → Qdrant (diary_child_{id})
└──→ audit_log (MariaDB)
- action: "diary_indexed"
- entity_id: entry_id
- metadata: {timestamp, provider, chunks}
```
#### 🗄️ Audit-Log Schema
```sql
CREATE TABLE audit_log (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
action VARCHAR(100) NOT NULL, -- z.B. "diary_indexed"
entity_type VARCHAR(50) NOT NULL, -- z.B. "diary", "document"
entity_id INT NOT NULL, -- z.B. entry_id
user_id INT NULL, -- NULL für System-Actions
metadata JSON NULL, -- Flexibles Tracking
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_action (action),
INDEX idx_entity (entity_type, entity_id),
INDEX idx_created (created_at)
);
```
#### 🔍 Audit-Log Queries
```bash
# Alle Aktionen für ein Kind
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
SELECT action, entity_id, created_at, metadata
FROM audit_log
WHERE entity_type=\"diary\"
AND JSON_EXTRACT(metadata, \"$.child_id\") = 1
ORDER BY created_at DESC
LIMIT 10;
"'
# Alle Indexierungs-Vorgänge heute
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
SELECT COUNT(*) as count, action
FROM audit_log
WHERE DATE(created_at) = CURDATE()
GROUP BY action;
"'
# Metadaten-Analyse
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
SELECT
JSON_EXTRACT(metadata, \"$.provider\") as provider,
COUNT(*) as count
FROM audit_log
WHERE action LIKE \"%indexed\"
GROUP BY provider;
"'
```
### Datenlöschung (Recht auf Vergessen)
```bash
#!/bin/bash
# delete_child_data.sh - DSGVO Löschung
CHILD_ID=$1
if [ -z "$CHILD_ID" ]; then
echo "Usage: $0 <child_id>"
exit 1
fi
echo "🗑️ DSGVO Datenlöschung für Kind $CHILD_ID"
echo "=========================================="
# 1. Qdrant Collection löschen
echo "1. Lösche Qdrant Collection..."
curl -X DELETE "http://localhost:6333/collections/diary_child_${CHILD_ID}"
# 2. MariaDB Einträge löschen
echo "2. Lösche Datenbank-Einträge..."
docker compose exec -T db sh -c "mariadb -u\$MARIADB_USER -p\$MARIADB_PASSWORD \$MARIADB_DATABASE <<EOF
-- Lösche Vektoren
DELETE FROM post_vectors WHERE child_id = $CHILD_ID;
-- Lösche Tagebuch-Einträge
DELETE FROM diary_entries WHERE child_id = $CHILD_ID;
-- Kind-Record löschen
DELETE FROM children WHERE id = $CHILD_ID;
-- Audit-Log NICHT löschen (DSGVO-Pflicht: Nachweisbarkeit)
INSERT INTO audit_log (action, entity_type, entity_id, metadata)
VALUES ('child_deleted', 'child', $CHILD_ID, JSON_OBJECT('timestamp', NOW()));
EOF"
echo ""
echo "✅ Datenlöschung abgeschlossen"
echo "📝 Audit-Log Eintrag erstellt (DSGVO-Nachweis)"
```
### Datenexport (Recht auf Datenportabilität)
```bash
#!/bin/bash
# export_child_data.sh - DSGVO Export
CHILD_ID=$1
OUTPUT_DIR="exports/child_${CHILD_ID}"
mkdir -p "$OUTPUT_DIR"
echo "📦 DSGVO Datenexport für Kind $CHILD_ID"
echo "========================================"
# 1. Tagebuch-Einträge
echo "1. Exportiere Tagebuch-Einträge..."
docker compose exec -T db sh -c "mariadb -u\$MARIADB_USER -p\$MARIADB_PASSWORD \$MARIADB_DATABASE -e \"
SELECT * FROM diary_entries WHERE child_id = $CHILD_ID
\" --xml" > "$OUTPUT_DIR/diary_entries.xml"
# 2. Metadaten
echo "2. Exportiere Metadaten..."
docker compose exec -T db sh -c "mariadb -u\$MARIADB_USER -p\$MARIADB_PASSWORD \$MARIADB_DATABASE -e \"
SELECT * FROM children WHERE id = $CHILD_ID
\" --xml" > "$OUTPUT_DIR/child_info.xml"
# 3. Audit-Log
echo "3. Exportiere Audit-Log..."
docker compose exec -T db sh -c "mariadb -u\$MARIADB_USER -p\$MARIADB_PASSWORD \$MARIADB_DATABASE -e \"
SELECT * FROM audit_log
WHERE entity_type='diary'
AND JSON_EXTRACT(metadata, '$.child_id') = $CHILD_ID
\" --xml" > "$OUTPUT_DIR/audit_log.xml"
# 4. Qdrant Vektoren (optional)
echo "4. Exportiere Vektoren-Metadaten..."
curl -s "http://localhost:6333/collections/diary_child_${CHILD_ID}/points?limit=1000" \
> "$OUTPUT_DIR/vectors.json"
# 5. ZIP erstellen
echo "5. Erstelle ZIP-Archiv..."
cd exports
zip -r "child_${CHILD_ID}_export_$(date +%Y%m%d).zip" "child_${CHILD_ID}/"
cd ..
echo ""
echo "✅ Export abgeschlossen"
echo "📁 Datei: exports/child_${CHILD_ID}_export_$(date +%Y%m%d).zip"
```
---
## 🔧 Wartung & Monitoring
### Tägliche Checks
```bash
#!/bin/bash
# daily_health_check.sh
echo "🏥 Crumbforest Täglicher Health-Check"
echo "====================================="
echo "Datum: $(date)"
echo ""
# 1. Container Status
echo "📦 Container Status:"
docker compose ps
# 2. Disk Space
echo ""
echo "💾 Speicherplatz:"
df -h | grep -E "Filesystem|/var/lib/docker"
# 3. Database Size
echo ""
echo "🗄️ Database Größe:"
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD -e "
SELECT
table_schema AS \"Database\",
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS \"Size (MB)\"
FROM information_schema.tables
WHERE table_schema = \"crumbcrm\"
GROUP BY table_schema;
"'
# 4. Qdrant Stats
echo ""
echo "📊 Qdrant Collections:"
curl -s http://localhost:6333/collections | jq '.result.collections[] | {name, points: .points_count}'
# 5. Recent Errors
echo ""
echo "⚠️ Letzte Errors (24h):"
docker compose logs --since 24h app 2>&1 | grep -i error | tail -10
# 6. API Response Time
echo ""
echo "⏱️ API Response Time:"
time curl -s http://localhost:8000/health > /dev/null
echo ""
echo "✅ Health-Check abgeschlossen"
```
### Wöchentliche Maintenance
```bash
#!/bin/bash
# weekly_maintenance.sh
echo "🛠️ Crumbforest Wöchentliche Wartung"
echo "===================================="
# 1. Docker Cleanup
echo "1. Docker Cleanup..."
docker system prune -f
# 2. Database Optimize
echo "2. Database Optimierung..."
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
OPTIMIZE TABLE posts;
OPTIMIZE TABLE post_vectors;
OPTIMIZE TABLE audit_log;
"'
# 3. Qdrant Optimization
echo "3. Qdrant Collections optimieren..."
for collection in $(curl -s http://localhost:6333/collections | jq -r '.result.collections[].name'); do
echo " - $collection"
curl -s -X POST "http://localhost:6333/collections/$collection/optimizer" > /dev/null
done
# 4. Logs rotieren
echo "4. Logs rotieren..."
docker compose logs --tail=10000 > "logs/crumbforest_$(date +%Y%m%d).log"
# 5. Backup erstellen
echo "5. Backup erstellen..."
./backup.sh
echo ""
echo "✅ Wartung abgeschlossen"
```
---
## 💾 Backup & Recovery
### Vollständiges Backup
```bash
#!/bin/bash
# backup.sh - Vollständiges System-Backup
BACKUP_DIR="backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
echo "💾 Crumbforest Backup"
echo "===================="
echo "Ziel: $BACKUP_DIR"
echo ""
# 1. Database Dump
echo "1. Database Backup..."
docker compose exec -T db sh -c 'mysqldump -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE' \
| gzip > "$BACKUP_DIR/database.sql.gz"
# 2. Qdrant Snapshot
echo "2. Qdrant Snapshot..."
curl -X POST "http://localhost:6333/collections/docs_crumbforest_/snapshots" \
-H "Content-Type: application/json" > "$BACKUP_DIR/qdrant_snapshot.json"
# 3. Config Files
echo "3. Config Backup..."
cp compose/.env "$BACKUP_DIR/.env.backup"
cp compose/docker-compose.yml "$BACKUP_DIR/docker-compose.yml"
# 4. Dokumente
echo "4. Dokumente Backup..."
tar -czf "$BACKUP_DIR/docs.tar.gz" docs/
# 5. Metadata
echo "5. Backup-Metadata..."
cat > "$BACKUP_DIR/backup_info.json" << EOF
{
"timestamp": "$(date -Iseconds)",
"hostname": "$(hostname)",
"docker_version": "$(docker --version)",
"compose_version": "$(docker compose version)",
"collections": $(curl -s http://localhost:6333/collections | jq '.result.collections')
}
EOF
# 6. Checksums
echo "6. Checksums erstellen..."
cd "$BACKUP_DIR"
sha256sum * > checksums.sha256
cd - > /dev/null
echo ""
echo "✅ Backup abgeschlossen"
echo "📁 Verzeichnis: $BACKUP_DIR"
echo "📊 Größe: $(du -sh $BACKUP_DIR | cut -f1)"
```
### Restore
```bash
#!/bin/bash
# restore.sh - System wiederherstellen
BACKUP_DIR=$1
if [ -z "$BACKUP_DIR" ]; then
echo "Usage: $0 <backup_directory>"
echo "Verfügbare Backups:"
ls -la backups/
exit 1
fi
echo "♻️ Crumbforest Restore"
echo "====================="
echo "Quelle: $BACKUP_DIR"
echo ""
read -p "ACHTUNG: Dies überschreibt ALLE Daten! Fortfahren? (yes/no) " confirm
if [ "$confirm" != "yes" ]; then
echo "Abgebrochen."
exit 0
fi
# 1. Container stoppen
echo "1. Stoppe Container..."
docker compose down
# 2. Database wiederherstellen
echo "2. Database Restore..."
docker compose up -d db
sleep 10
gunzip < "$BACKUP_DIR/database.sql.gz" | \
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE'
# 3. Qdrant Daten löschen
echo "3. Qdrant Reset..."
rm -rf data/qdrant/*
# 4. Dokumente wiederherstellen
echo "4. Dokumente Restore..."
tar -xzf "$BACKUP_DIR/docs.tar.gz"
# 5. Config wiederherstellen
echo "5. Config Restore..."
cp "$BACKUP_DIR/.env.backup" compose/.env
# 6. System starten
echo "6. Starte System..."
docker compose up -d
# 7. Re-Indexing
echo "7. Warte auf System..."
sleep 30
echo "8. Triggere Re-Indexing..."
curl -X POST "http://localhost:8000/api/documents/index" \
-H "Content-Type: application/json" \
-d '{"provider": "openrouter", "force": true}'
echo ""
echo "✅ Restore abgeschlossen"
echo "⚠️ Prüfe System mit: ./test_all.sh"
```
---
## ⚡ Performance-Tuning
### Qdrant Optimierung
```bash
# Qdrant Memory Mapping aktivieren
curl -X PATCH http://localhost:6333/collections/docs_crumbforest_ \
-H "Content-Type: application/json" \
-d '{
"optimizers_config": {
"memmap_threshold": 20000
}
}'
# Indexing optimieren
curl -X PATCH http://localhost:6333/collections/docs_crumbforest_ \
-H "Content-Type: application/json" \
-d '{
"hnsw_config": {
"m": 16,
"ef_construct": 100
}
}'
```
### Database Optimierung
```sql
-- Indexes für häufige Queries
ALTER TABLE post_vectors ADD INDEX idx_collection_type (collection_name, post_type);
ALTER TABLE audit_log ADD INDEX idx_created_action (created_at, action);
-- Query Cache aktivieren
SET GLOBAL query_cache_size = 67108864; -- 64 MB
SET GLOBAL query_cache_type = 1;
```
### Docker Resources
```yaml
# docker-compose.yml
services:
app:
deploy:
resources:
limits:
cpus: '2.0'
memory: 2G
reservations:
cpus: '1.0'
memory: 1G
db:
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 512M
```
---
## 👨‍💻 Entwickler-Guide
### Development Setup
```bash
# 1. Python Virtual Environment
python3 -m venv venv
source venv/bin/activate
# 2. Dependencies installieren
pip install -r app/requirements.txt
# 3. Pre-commit Hooks (optional)
pip install pre-commit
pre-commit install
# 4. Development Server (ohne Docker)
cd app
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
### Code-Style
```bash
# Black Formatter
black app/ --line-length 100
# Flake8 Linter
flake8 app/ --max-line-length 100
# Type Checking
mypy app/ --ignore-missing-imports
```
### Neue Features hinzufügen
```python
# 1. Neuen Router erstellen
# app/routers/my_feature.py
from fastapi import APIRouter, Depends
from deps import admin_required
router = APIRouter()
@router.get("/status")
async def get_status(user = Depends(admin_required)):
return {"status": "ok"}
# 2. In main.py mounten
from routers.my_feature import router as my_feature_router
app.include_router(my_feature_router, prefix="/api/my-feature", tags=["My Feature"])
# 3. Tests schreiben
# tests/test_my_feature.py
def test_my_feature_status():
response = client.get("/api/my-feature/status")
assert response.status_code == 200
# 4. Dokumentieren
# Docstring mit Examples
```
### Debug-Modus
```bash
# Aktiviere Debug Logging
export LOG_LEVEL=DEBUG
# FastAPI Debug-Modus
# app/main.py
app = FastAPI(debug=True)
# SQL Query Logging
# config.py
echo_queries = True
```
---
## 🦉 Wuuuuhuuu! - Schlusswort
Dieses Handbuch ist ein lebendes Dokument wie der Wald selbst wächst und verändert es sich.
**Grundregeln:**
1. **Atme** - Bei Stress: 3x tief durchatmen, dann debuggen
2. **Logs sind deine Freunde** - Sie urteilen nicht, sie erzählen nur
3. **DSGVO ist Respekt** - Nicht Bürokratie, sondern Achtung vor jedem Menschen
4. **Der Wald verzeiht Fehler** - Jeder Bug ist Humus für besseren Code
**Bei Problemen:**
- Lies die Logs: `docker compose logs -f app`
- Prüfe die Tests: `./run_all_tests.sh`
- Atme
- Frag die Eule (Claude/Community)
**Weiterentwicklung:**
- Neue Tests hinzufügen wenn Bugs gefunden werden
- Dokumentation aktualisieren bei Features
- DSGVO-Compliance bei jeder Änderung prüfen
---
**Happy Coding im Crumbforest! 💚**
*"Im Wald sind wir alle Lernende. Die Bäume wachsen langsam, aber stetig. Genau wie guter Code."*
---
Erstellt: 2025
Version: 1.0
Lizenz: [Your License]
Kontakt: [Your Contact]