11 KiB
11 KiB
🏢 RZ Operations Handbuch - Crumbcore v1
Zielgruppe: RZ-Team, Betrieb, Admins
System: Crumbcore (FastAPI + RAG)
Footprint: 605 MB, 3 Container
📊 System Overview
Crumbcore Stack:
├── FastAPI App (256 MB RAM)
│ ├── RAG Engine (Qdrant Client)
│ ├── 3 AI Characters (Eule, Fox, Bugsy)
│ └── Document Search & Chat
├── MariaDB 11.7 (512 MB RAM)
│ └── User Management, Sessions
└── Qdrant 1.12.5 (512 MB RAM)
└── Vector Storage (733 Docs indexed)
Total: ~1.3 GB RAM, 605 MB Disk
🚀 Initial Deployment
1. Vorbereitung
# 1. Repository klonen (oder Tarball entpacken)
git clone <repo-url> crumbcore
cd crumbcore
# 2. ENV File erstellen
cp .env.example .env.rz
nano .env.rz
# 3. Secrets generieren
openssl rand -hex 32 # SECRET_KEY
openssl rand -hex 16 # DB_PASSWORD
openssl rand -hex 24 # DB_ROOT_PASSWORD
2. Deployment
# Automatisches Deployment
./rz-deploy.sh
# Oder manuell:
docker compose -f rz-deployment.yml --env-file .env.rz up -d
3. Verify
# Health Check
curl http://localhost:8000/health
# Collections Check
curl http://localhost:6333/collections
# Login Test
curl -X POST http://localhost:8000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@crumb.local","password":"admin123"}'
🔄 Standard Operations
Logs anzeigen
# Live Logs (alle Services)
docker compose -f rz-deployment.yml logs -f
# Nur Application
docker compose -f rz-deployment.yml logs -f app
# Letzte 100 Zeilen
docker compose -f rz-deployment.yml logs --tail=100 app
# Mit Zeitstempel
docker compose -f rz-deployment.yml logs -f -t app
# Nur Errors
docker compose -f rz-deployment.yml logs app | grep ERROR
Restart
# Einzelner Service
docker compose -f rz-deployment.yml restart app
# Alle Services
docker compose -f rz-deployment.yml restart
# Mit Rebuild (nach Code-Update)
docker compose -f rz-deployment.yml up -d --build app
Stop/Start
# Stoppen
docker compose -f rz-deployment.yml stop
# Starten
docker compose -f rz-deployment.yml start
# Down (Container entfernen, Volumes bleiben)
docker compose -f rz-deployment.yml down
# Down (inkl. Volumes - ⚠️ DATENVERLUST!)
docker compose -f rz-deployment.yml down -v
💾 Backup & Restore
Backup erstellen
#!/bin/bash
BACKUP_DIR="./backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
# Database Backup
docker compose -f rz-deployment.yml exec -T db \
sh -c 'mariadb-dump -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE' \
> "$BACKUP_DIR/database.sql"
# Qdrant Backup (Volume)
docker run --rm \
-v rz-crumbcore-qdrant-data:/data \
-v "$BACKUP_DIR":/backup \
alpine tar czf /backup/qdrant-data.tar.gz -C /data .
# App Logs Backup
docker run --rm \
-v rz-crumbcore-app-logs:/data \
-v "$BACKUP_DIR":/backup \
alpine tar czf /backup/app-logs.tar.gz -C /data .
echo "✅ Backup erstellt: $BACKUP_DIR"
Restore
#!/bin/bash
BACKUP_DIR="./backups/20250103_120000" # Anpassen!
# Database Restore
cat "$BACKUP_DIR/database.sql" | \
docker compose -f rz-deployment.yml exec -T db \
sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE'
# Qdrant Restore
docker run --rm \
-v rz-crumbcore-qdrant-data:/data \
-v "$BACKUP_DIR":/backup \
alpine sh -c "cd /data && tar xzf /backup/qdrant-data.tar.gz"
# Restart Services
docker compose -f rz-deployment.yml restart
echo "✅ Restore abgeschlossen"
🔧 Maintenance Tasks
1. Update Deployment
# 1. Backup erstellen (siehe oben)
./backup-crumbcore.sh
# 2. Neue Version pullen
docker pull crumbcore:v1.1 # Oder neuere Version
# 3. Update in docker-compose.yml
nano rz-deployment.yml
# image: crumbcore:v1.1
# 4. Rolling Update
docker compose -f rz-deployment.yml up -d --no-deps app
# 5. Health Check
curl http://localhost:8000/health
2. Re-Index Dokumente
# Alle Dokumente neu indexieren
docker compose -f rz-deployment.yml exec app \
python3 -c "
from scripts.index_docs import index_documents
index_documents('docs/rz-nullfeld', 'docs_rz_nullfeld_', force=True)
index_documents('docs/crumbforest', 'docs_crumbforest_', force=True)
print('✅ Re-Indexing abgeschlossen')
"
# Oder via API (mit Admin Login)
curl -X POST http://localhost:8000/api/documents/index \
-H "Content-Type: application/json" \
-H "Cookie: session=..." \
-d '{"provider": "openrouter", "force": true}'
3. Database Maintenance
# Optimize Tables
docker compose -f rz-deployment.yml exec db \
mariadb -u root -p${DB_ROOT_PASSWORD} crumbforest \
-e "OPTIMIZE TABLE users, sessions, diary_entries;"
# Check Table Status
docker compose -f rz-deployment.yml exec db \
mariadb -u root -p${DB_ROOT_PASSWORD} crumbforest \
-e "SHOW TABLE STATUS;"
# Vacuum (wenn InnoDB)
docker compose -f rz-deployment.yml exec db \
mariadb -u root -p${DB_ROOT_PASSWORD} \
-e "SET GLOBAL innodb_fast_shutdown=0;"
4. Log Rotation
# Logs älter als 30 Tage löschen
find /var/lib/docker/volumes/rz-crumbcore-app-logs/_data \
-name "*.jsonl" -mtime +30 -delete
# Oder via Docker
docker run --rm \
-v rz-crumbcore-app-logs:/logs \
alpine find /logs -name "*.jsonl" -mtime +30 -delete
5. Cleanup
# Alte Images entfernen
docker image prune -a --filter "until=720h"
# Ungenutzte Volumes (⚠️ vorsichtig!)
docker volume prune
# Ungenutzte Networks
docker network prune
# System-weiter Cleanup
docker system prune -a --volumes
📊 Monitoring
Health Endpoints
# Application Health
curl http://localhost:8000/health
# → {"status": "healthy", "version": "1.0.0"}
# Qdrant Health
curl http://localhost:6333/health
# → OK
# Database Health
docker compose -f rz-deployment.yml exec db \
sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD -e "SELECT 1"'
Metrics
# Qdrant Collections Status
curl -s http://localhost:6333/collections | jq .
# Container Stats
docker stats rz-crumbcore-app rz-crumbcore-db rz-crumbcore-qdrant
# Disk Usage
docker system df
docker volume ls -q | xargs docker volume inspect | \
jq -r '.[] | "\(.Name): \(.Mountpoint)"' | \
while read line; do
name=$(echo $line | cut -d: -f1)
path=$(echo $line | cut -d: -f2)
size=$(du -sh "$path" 2>/dev/null | cut -f1)
echo "$name: $size"
done
Logs & Alerting
# Critical Errors (letzte Stunde)
docker compose -f rz-deployment.yml logs --since 1h app | \
grep -i "error\|critical\|exception" | \
tail -n 20
# Failed Login Attempts
docker compose -f rz-deployment.yml logs --since 1h app | \
grep "Login failed" | wc -l
# Rate Limit Hits
docker compose -f rz-deployment.yml logs --since 1h app | \
grep "429" | wc -l
# OpenRouter API Errors
docker compose -f rz-deployment.yml logs --since 1h app | \
grep "OpenRouter" | grep -i error
🔥 Troubleshooting
Problem: Container startet nicht
# Logs checken
docker compose -f rz-deployment.yml logs app
# Häufige Ursachen:
# 1. Port belegt
lsof -i :8000
# 2. ENV Variables falsch
docker compose -f rz-deployment.yml config
# 3. Volume Permissions
docker run --rm -v rz-crumbcore-app-logs:/data alpine ls -la /data
Problem: Database Connection Failed
# 1. DB erreichbar?
docker compose -f rz-deployment.yml exec db \
mariadb -u crumb -p -e "SELECT 1"
# 2. Passwort korrekt?
grep DB_PASSWORD .env.rz
# 3. Netzwerk ok?
docker network inspect rz-internal
# 4. DB Logs checken
docker compose -f rz-deployment.yml logs db | tail -n 50
Problem: Qdrant Fehler
# 1. Qdrant erreichbar?
curl http://localhost:6333/health
# 2. Collections vorhanden?
curl http://localhost:6333/collections
# 3. Storage Issues?
docker volume inspect rz-crumbcore-qdrant-data
# 4. Neu initialisieren (⚠️ Datenverlust!)
docker compose -f rz-deployment.yml down
docker volume rm rz-crumbcore-qdrant-data
docker compose -f rz-deployment.yml up -d
# Dann re-index!
Problem: Hohe CPU/RAM Usage
# Stats anzeigen
docker stats --no-stream
# Top Prozesse im Container
docker compose -f rz-deployment.yml exec app top
# Qdrant Memory Usage
curl http://localhost:6333/metrics
# Resource Limits setzen (docker-compose.yml):
# deploy:
# resources:
# limits:
# cpus: '2.0'
# memory: 1G
Problem: Slow Response Times
# 1. Response Time messen
time curl -s http://localhost:8000/health
# 2. Qdrant Query Performance
curl -s http://localhost:6333/metrics | grep query_time
# 3. Database Slow Queries
docker compose -f rz-deployment.yml exec db \
mariadb -u root -p -e "SHOW FULL PROCESSLIST;"
# 4. App Logs auf Delays prüfen
docker compose -f rz-deployment.yml logs app | grep "took.*ms"
🔒 Security Checklist
Nach Deployment prüfen:
# 1. Firewall aktiv?
sudo ufw status
# Nur 8000 (intern) und 6333 (intern) offen
# 2. Secrets rotiert?
grep -i "change_me" .env.rz # Sollte nichts finden!
# 3. Default Passwords geändert?
# Admin: admin@crumb.local / admin123 → ÄNDERN!
# 4. CORS korrekt?
curl -I http://localhost:8000/api/chat \
-H "Origin: https://evil-site.com"
# Sollte CORS Error geben
# 5. Rate Limiting aktiv?
for i in {1..10}; do
curl -s -o /dev/null -w "%{http_code}\n" \
-X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"character_id":"eule","question":"test","lang":"de"}'
done
# Nach 5 Requests: 429 erwarten
# 6. TLS (wenn öffentlich)?
curl -I https://docs.rz-nullfeld.de
# Sollte 200 + HSTS Header haben
📞 Incident Response
Severity Levels
P1 - Critical (Sofort):
- System down
- Data Loss
- Security Breach
P2 - High (< 2h):
- Performance Issues
- Partial Outage
- API Errors > 10%
P3 - Medium (< 8h):
- Minor Bugs
- Log Warnings
- Single Feature broken
P4 - Low (Next Sprint):
- Feature Requests
- Documentation
- Cosmetic Issues
P1 Response
# 1. Assess
curl http://localhost:8000/health
docker compose -f rz-deployment.yml ps
# 2. Quick Fix (Restart)
docker compose -f rz-deployment.yml restart
# 3. If still down, restore from backup
./restore-crumbcore.sh
# 4. Notify stakeholders
# 5. Post-mortem doc
📝 Change Management
Deployment Checklist
- Backup erstellt
- Rollback-Plan bereit
- Stakeholders informiert
- Maintenance Window geplant
- Health Checks vorbereitet
- Logs monitored (30 min nach Deploy)
Rollback Procedure
# 1. Stop new version
docker compose -f rz-deployment.yml down
# 2. Restore backup (siehe oben)
./restore-crumbcore.sh
# 3. Start old version
docker compose -f rz-deployment.yml up -d
# 4. Verify
curl http://localhost:8000/health
📚 Weitere Ressourcen
- Security Audit:
docs/security/audit_2025-12-03_chat_v1_security.md - Deployment Log:
docs/security/DEPLOYMENT_SUCCESS_2025-12-03.md - Quickstart:
QUICKSTART.md - Architecture:
CLAUDE.md
Letzte Aktualisierung: 2025-12-04
Version: 1.0
Maintainer: RZ-Team
🌲 Stay safe im Crumbforest!