Initial commit: Crumbforest Architecture Refinement v1 (Clean)
This commit is contained in:
545
docs/rz-deployment/RZ_OPERATIONS.md
Normal file
545
docs/rz-deployment/RZ_OPERATIONS.md
Normal file
@@ -0,0 +1,545 @@
|
||||
# 🏢 RZ Operations Handbuch - Crumbcore v1
|
||||
|
||||
**Zielgruppe:** RZ-Team, Betrieb, Admins
|
||||
**System:** Crumbcore (FastAPI + RAG)
|
||||
**Footprint:** 605 MB, 3 Container
|
||||
|
||||
---
|
||||
|
||||
## 📊 System Overview
|
||||
|
||||
```
|
||||
Crumbcore Stack:
|
||||
├── FastAPI App (256 MB RAM)
|
||||
│ ├── RAG Engine (Qdrant Client)
|
||||
│ ├── 3 AI Characters (Eule, Fox, Bugsy)
|
||||
│ └── Document Search & Chat
|
||||
├── MariaDB 11.7 (512 MB RAM)
|
||||
│ └── User Management, Sessions
|
||||
└── Qdrant 1.12.5 (512 MB RAM)
|
||||
└── Vector Storage (733 Docs indexed)
|
||||
|
||||
Total: ~1.3 GB RAM, 605 MB Disk
|
||||
```
|
||||
|
||||
## 🚀 Initial Deployment
|
||||
|
||||
### 1. Vorbereitung
|
||||
|
||||
```bash
|
||||
# 1. Repository klonen (oder Tarball entpacken)
|
||||
git clone <repo-url> crumbcore
|
||||
cd crumbcore
|
||||
|
||||
# 2. ENV File erstellen
|
||||
cp .env.example .env.rz
|
||||
nano .env.rz
|
||||
|
||||
# 3. Secrets generieren
|
||||
openssl rand -hex 32 # SECRET_KEY
|
||||
openssl rand -hex 16 # DB_PASSWORD
|
||||
openssl rand -hex 24 # DB_ROOT_PASSWORD
|
||||
```
|
||||
|
||||
### 2. Deployment
|
||||
|
||||
```bash
|
||||
# Automatisches Deployment
|
||||
./rz-deploy.sh
|
||||
|
||||
# Oder manuell:
|
||||
docker compose -f rz-deployment.yml --env-file .env.rz up -d
|
||||
```
|
||||
|
||||
### 3. Verify
|
||||
|
||||
```bash
|
||||
# Health Check
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# Collections Check
|
||||
curl http://localhost:6333/collections
|
||||
|
||||
# Login Test
|
||||
curl -X POST http://localhost:8000/api/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email":"admin@crumb.local","password":"admin123"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Standard Operations
|
||||
|
||||
### Logs anzeigen
|
||||
|
||||
```bash
|
||||
# Live Logs (alle Services)
|
||||
docker compose -f rz-deployment.yml logs -f
|
||||
|
||||
# Nur Application
|
||||
docker compose -f rz-deployment.yml logs -f app
|
||||
|
||||
# Letzte 100 Zeilen
|
||||
docker compose -f rz-deployment.yml logs --tail=100 app
|
||||
|
||||
# Mit Zeitstempel
|
||||
docker compose -f rz-deployment.yml logs -f -t app
|
||||
|
||||
# Nur Errors
|
||||
docker compose -f rz-deployment.yml logs app | grep ERROR
|
||||
```
|
||||
|
||||
### Restart
|
||||
|
||||
```bash
|
||||
# Einzelner Service
|
||||
docker compose -f rz-deployment.yml restart app
|
||||
|
||||
# Alle Services
|
||||
docker compose -f rz-deployment.yml restart
|
||||
|
||||
# Mit Rebuild (nach Code-Update)
|
||||
docker compose -f rz-deployment.yml up -d --build app
|
||||
```
|
||||
|
||||
### Stop/Start
|
||||
|
||||
```bash
|
||||
# Stoppen
|
||||
docker compose -f rz-deployment.yml stop
|
||||
|
||||
# Starten
|
||||
docker compose -f rz-deployment.yml start
|
||||
|
||||
# Down (Container entfernen, Volumes bleiben)
|
||||
docker compose -f rz-deployment.yml down
|
||||
|
||||
# Down (inkl. Volumes - ⚠️ DATENVERLUST!)
|
||||
docker compose -f rz-deployment.yml down -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💾 Backup & Restore
|
||||
|
||||
### Backup erstellen
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="./backups/$(date +%Y%m%d_%H%M%S)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
# Database Backup
|
||||
docker compose -f rz-deployment.yml exec -T db \
|
||||
sh -c 'mariadb-dump -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE' \
|
||||
> "$BACKUP_DIR/database.sql"
|
||||
|
||||
# Qdrant Backup (Volume)
|
||||
docker run --rm \
|
||||
-v rz-crumbcore-qdrant-data:/data \
|
||||
-v "$BACKUP_DIR":/backup \
|
||||
alpine tar czf /backup/qdrant-data.tar.gz -C /data .
|
||||
|
||||
# App Logs Backup
|
||||
docker run --rm \
|
||||
-v rz-crumbcore-app-logs:/data \
|
||||
-v "$BACKUP_DIR":/backup \
|
||||
alpine tar czf /backup/app-logs.tar.gz -C /data .
|
||||
|
||||
echo "✅ Backup erstellt: $BACKUP_DIR"
|
||||
```
|
||||
|
||||
### Restore
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="./backups/20250103_120000" # Anpassen!
|
||||
|
||||
# Database Restore
|
||||
cat "$BACKUP_DIR/database.sql" | \
|
||||
docker compose -f rz-deployment.yml exec -T db \
|
||||
sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE'
|
||||
|
||||
# Qdrant Restore
|
||||
docker run --rm \
|
||||
-v rz-crumbcore-qdrant-data:/data \
|
||||
-v "$BACKUP_DIR":/backup \
|
||||
alpine sh -c "cd /data && tar xzf /backup/qdrant-data.tar.gz"
|
||||
|
||||
# Restart Services
|
||||
docker compose -f rz-deployment.yml restart
|
||||
|
||||
echo "✅ Restore abgeschlossen"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Maintenance Tasks
|
||||
|
||||
### 1. Update Deployment
|
||||
|
||||
```bash
|
||||
# 1. Backup erstellen (siehe oben)
|
||||
./backup-crumbcore.sh
|
||||
|
||||
# 2. Neue Version pullen
|
||||
docker pull crumbcore:v1.1 # Oder neuere Version
|
||||
|
||||
# 3. Update in docker-compose.yml
|
||||
nano rz-deployment.yml
|
||||
# image: crumbcore:v1.1
|
||||
|
||||
# 4. Rolling Update
|
||||
docker compose -f rz-deployment.yml up -d --no-deps app
|
||||
|
||||
# 5. Health Check
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### 2. Re-Index Dokumente
|
||||
|
||||
```bash
|
||||
# Alle Dokumente neu indexieren
|
||||
docker compose -f rz-deployment.yml exec app \
|
||||
python3 -c "
|
||||
from scripts.index_docs import index_documents
|
||||
index_documents('docs/rz-nullfeld', 'docs_rz_nullfeld_', force=True)
|
||||
index_documents('docs/crumbforest', 'docs_crumbforest_', force=True)
|
||||
print('✅ Re-Indexing abgeschlossen')
|
||||
"
|
||||
|
||||
# Oder via API (mit Admin Login)
|
||||
curl -X POST http://localhost:8000/api/documents/index \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Cookie: session=..." \
|
||||
-d '{"provider": "openrouter", "force": true}'
|
||||
```
|
||||
|
||||
### 3. Database Maintenance
|
||||
|
||||
```bash
|
||||
# Optimize Tables
|
||||
docker compose -f rz-deployment.yml exec db \
|
||||
mariadb -u root -p${DB_ROOT_PASSWORD} crumbforest \
|
||||
-e "OPTIMIZE TABLE users, sessions, diary_entries;"
|
||||
|
||||
# Check Table Status
|
||||
docker compose -f rz-deployment.yml exec db \
|
||||
mariadb -u root -p${DB_ROOT_PASSWORD} crumbforest \
|
||||
-e "SHOW TABLE STATUS;"
|
||||
|
||||
# Vacuum (wenn InnoDB)
|
||||
docker compose -f rz-deployment.yml exec db \
|
||||
mariadb -u root -p${DB_ROOT_PASSWORD} \
|
||||
-e "SET GLOBAL innodb_fast_shutdown=0;"
|
||||
```
|
||||
|
||||
### 4. Log Rotation
|
||||
|
||||
```bash
|
||||
# Logs älter als 30 Tage löschen
|
||||
find /var/lib/docker/volumes/rz-crumbcore-app-logs/_data \
|
||||
-name "*.jsonl" -mtime +30 -delete
|
||||
|
||||
# Oder via Docker
|
||||
docker run --rm \
|
||||
-v rz-crumbcore-app-logs:/logs \
|
||||
alpine find /logs -name "*.jsonl" -mtime +30 -delete
|
||||
```
|
||||
|
||||
### 5. Cleanup
|
||||
|
||||
```bash
|
||||
# Alte Images entfernen
|
||||
docker image prune -a --filter "until=720h"
|
||||
|
||||
# Ungenutzte Volumes (⚠️ vorsichtig!)
|
||||
docker volume prune
|
||||
|
||||
# Ungenutzte Networks
|
||||
docker network prune
|
||||
|
||||
# System-weiter Cleanup
|
||||
docker system prune -a --volumes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Health Endpoints
|
||||
|
||||
```bash
|
||||
# Application Health
|
||||
curl http://localhost:8000/health
|
||||
# → {"status": "healthy", "version": "1.0.0"}
|
||||
|
||||
# Qdrant Health
|
||||
curl http://localhost:6333/health
|
||||
# → OK
|
||||
|
||||
# Database Health
|
||||
docker compose -f rz-deployment.yml exec db \
|
||||
sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD -e "SELECT 1"'
|
||||
```
|
||||
|
||||
### Metrics
|
||||
|
||||
```bash
|
||||
# Qdrant Collections Status
|
||||
curl -s http://localhost:6333/collections | jq .
|
||||
|
||||
# Container Stats
|
||||
docker stats rz-crumbcore-app rz-crumbcore-db rz-crumbcore-qdrant
|
||||
|
||||
# Disk Usage
|
||||
docker system df
|
||||
docker volume ls -q | xargs docker volume inspect | \
|
||||
jq -r '.[] | "\(.Name): \(.Mountpoint)"' | \
|
||||
while read line; do
|
||||
name=$(echo $line | cut -d: -f1)
|
||||
path=$(echo $line | cut -d: -f2)
|
||||
size=$(du -sh "$path" 2>/dev/null | cut -f1)
|
||||
echo "$name: $size"
|
||||
done
|
||||
```
|
||||
|
||||
### Logs & Alerting
|
||||
|
||||
```bash
|
||||
# Critical Errors (letzte Stunde)
|
||||
docker compose -f rz-deployment.yml logs --since 1h app | \
|
||||
grep -i "error\|critical\|exception" | \
|
||||
tail -n 20
|
||||
|
||||
# Failed Login Attempts
|
||||
docker compose -f rz-deployment.yml logs --since 1h app | \
|
||||
grep "Login failed" | wc -l
|
||||
|
||||
# Rate Limit Hits
|
||||
docker compose -f rz-deployment.yml logs --since 1h app | \
|
||||
grep "429" | wc -l
|
||||
|
||||
# OpenRouter API Errors
|
||||
docker compose -f rz-deployment.yml logs --since 1h app | \
|
||||
grep "OpenRouter" | grep -i error
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔥 Troubleshooting
|
||||
|
||||
### Problem: Container startet nicht
|
||||
|
||||
```bash
|
||||
# Logs checken
|
||||
docker compose -f rz-deployment.yml logs app
|
||||
|
||||
# Häufige Ursachen:
|
||||
# 1. Port belegt
|
||||
lsof -i :8000
|
||||
|
||||
# 2. ENV Variables falsch
|
||||
docker compose -f rz-deployment.yml config
|
||||
|
||||
# 3. Volume Permissions
|
||||
docker run --rm -v rz-crumbcore-app-logs:/data alpine ls -la /data
|
||||
```
|
||||
|
||||
### Problem: Database Connection Failed
|
||||
|
||||
```bash
|
||||
# 1. DB erreichbar?
|
||||
docker compose -f rz-deployment.yml exec db \
|
||||
mariadb -u crumb -p -e "SELECT 1"
|
||||
|
||||
# 2. Passwort korrekt?
|
||||
grep DB_PASSWORD .env.rz
|
||||
|
||||
# 3. Netzwerk ok?
|
||||
docker network inspect rz-internal
|
||||
|
||||
# 4. DB Logs checken
|
||||
docker compose -f rz-deployment.yml logs db | tail -n 50
|
||||
```
|
||||
|
||||
### Problem: Qdrant Fehler
|
||||
|
||||
```bash
|
||||
# 1. Qdrant erreichbar?
|
||||
curl http://localhost:6333/health
|
||||
|
||||
# 2. Collections vorhanden?
|
||||
curl http://localhost:6333/collections
|
||||
|
||||
# 3. Storage Issues?
|
||||
docker volume inspect rz-crumbcore-qdrant-data
|
||||
|
||||
# 4. Neu initialisieren (⚠️ Datenverlust!)
|
||||
docker compose -f rz-deployment.yml down
|
||||
docker volume rm rz-crumbcore-qdrant-data
|
||||
docker compose -f rz-deployment.yml up -d
|
||||
# Dann re-index!
|
||||
```
|
||||
|
||||
### Problem: Hohe CPU/RAM Usage
|
||||
|
||||
```bash
|
||||
# Stats anzeigen
|
||||
docker stats --no-stream
|
||||
|
||||
# Top Prozesse im Container
|
||||
docker compose -f rz-deployment.yml exec app top
|
||||
|
||||
# Qdrant Memory Usage
|
||||
curl http://localhost:6333/metrics
|
||||
|
||||
# Resource Limits setzen (docker-compose.yml):
|
||||
# deploy:
|
||||
# resources:
|
||||
# limits:
|
||||
# cpus: '2.0'
|
||||
# memory: 1G
|
||||
```
|
||||
|
||||
### Problem: Slow Response Times
|
||||
|
||||
```bash
|
||||
# 1. Response Time messen
|
||||
time curl -s http://localhost:8000/health
|
||||
|
||||
# 2. Qdrant Query Performance
|
||||
curl -s http://localhost:6333/metrics | grep query_time
|
||||
|
||||
# 3. Database Slow Queries
|
||||
docker compose -f rz-deployment.yml exec db \
|
||||
mariadb -u root -p -e "SHOW FULL PROCESSLIST;"
|
||||
|
||||
# 4. App Logs auf Delays prüfen
|
||||
docker compose -f rz-deployment.yml logs app | grep "took.*ms"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security Checklist
|
||||
|
||||
### Nach Deployment prüfen:
|
||||
|
||||
```bash
|
||||
# 1. Firewall aktiv?
|
||||
sudo ufw status
|
||||
# Nur 8000 (intern) und 6333 (intern) offen
|
||||
|
||||
# 2. Secrets rotiert?
|
||||
grep -i "change_me" .env.rz # Sollte nichts finden!
|
||||
|
||||
# 3. Default Passwords geändert?
|
||||
# Admin: admin@crumb.local / admin123 → ÄNDERN!
|
||||
|
||||
# 4. CORS korrekt?
|
||||
curl -I http://localhost:8000/api/chat \
|
||||
-H "Origin: https://evil-site.com"
|
||||
# Sollte CORS Error geben
|
||||
|
||||
# 5. Rate Limiting aktiv?
|
||||
for i in {1..10}; do
|
||||
curl -s -o /dev/null -w "%{http_code}\n" \
|
||||
-X POST http://localhost:8000/api/chat \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"character_id":"eule","question":"test","lang":"de"}'
|
||||
done
|
||||
# Nach 5 Requests: 429 erwarten
|
||||
|
||||
# 6. TLS (wenn öffentlich)?
|
||||
curl -I https://docs.rz-nullfeld.de
|
||||
# Sollte 200 + HSTS Header haben
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📞 Incident Response
|
||||
|
||||
### Severity Levels
|
||||
|
||||
**P1 - Critical (Sofort):**
|
||||
- System down
|
||||
- Data Loss
|
||||
- Security Breach
|
||||
|
||||
**P2 - High (< 2h):**
|
||||
- Performance Issues
|
||||
- Partial Outage
|
||||
- API Errors > 10%
|
||||
|
||||
**P3 - Medium (< 8h):**
|
||||
- Minor Bugs
|
||||
- Log Warnings
|
||||
- Single Feature broken
|
||||
|
||||
**P4 - Low (Next Sprint):**
|
||||
- Feature Requests
|
||||
- Documentation
|
||||
- Cosmetic Issues
|
||||
|
||||
### P1 Response
|
||||
|
||||
```bash
|
||||
# 1. Assess
|
||||
curl http://localhost:8000/health
|
||||
docker compose -f rz-deployment.yml ps
|
||||
|
||||
# 2. Quick Fix (Restart)
|
||||
docker compose -f rz-deployment.yml restart
|
||||
|
||||
# 3. If still down, restore from backup
|
||||
./restore-crumbcore.sh
|
||||
|
||||
# 4. Notify stakeholders
|
||||
# 5. Post-mortem doc
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Change Management
|
||||
|
||||
### Deployment Checklist
|
||||
|
||||
- [ ] Backup erstellt
|
||||
- [ ] Rollback-Plan bereit
|
||||
- [ ] Stakeholders informiert
|
||||
- [ ] Maintenance Window geplant
|
||||
- [ ] Health Checks vorbereitet
|
||||
- [ ] Logs monitored (30 min nach Deploy)
|
||||
|
||||
### Rollback Procedure
|
||||
|
||||
```bash
|
||||
# 1. Stop new version
|
||||
docker compose -f rz-deployment.yml down
|
||||
|
||||
# 2. Restore backup (siehe oben)
|
||||
./restore-crumbcore.sh
|
||||
|
||||
# 3. Start old version
|
||||
docker compose -f rz-deployment.yml up -d
|
||||
|
||||
# 4. Verify
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Weitere Ressourcen
|
||||
|
||||
- **Security Audit:** `docs/security/audit_2025-12-03_chat_v1_security.md`
|
||||
- **Deployment Log:** `docs/security/DEPLOYMENT_SUCCESS_2025-12-03.md`
|
||||
- **Quickstart:** `QUICKSTART.md`
|
||||
- **Architecture:** `CLAUDE.md`
|
||||
|
||||
---
|
||||
|
||||
**Letzte Aktualisierung:** 2025-12-04
|
||||
**Version:** 1.0
|
||||
**Maintainer:** RZ-Team
|
||||
|
||||
🌲 **Stay safe im Crumbforest!**
|
||||
Reference in New Issue
Block a user