Files
Crumb-Core-v.1/docs/rz-deployment/RZ_OPERATIONS.md

11 KiB

🏢 RZ Operations Handbuch - Crumbcore v1

Zielgruppe: RZ-Team, Betrieb, Admins
System: Crumbcore (FastAPI + RAG)
Footprint: 605 MB, 3 Container


📊 System Overview

Crumbcore Stack:
├── FastAPI App (256 MB RAM)
│   ├── RAG Engine (Qdrant Client)
│   ├── 3 AI Characters (Eule, Fox, Bugsy)
│   └── Document Search & Chat
├── MariaDB 11.7 (512 MB RAM)
│   └── User Management, Sessions
└── Qdrant 1.12.5 (512 MB RAM)
    └── Vector Storage (733 Docs indexed)

Total: ~1.3 GB RAM, 605 MB Disk

🚀 Initial Deployment

1. Vorbereitung

# 1. Repository klonen (oder Tarball entpacken)
git clone <repo-url> crumbcore
cd crumbcore

# 2. ENV File erstellen
cp .env.example .env.rz
nano .env.rz

# 3. Secrets generieren
openssl rand -hex 32  # SECRET_KEY
openssl rand -hex 16  # DB_PASSWORD
openssl rand -hex 24  # DB_ROOT_PASSWORD

2. Deployment

# Automatisches Deployment
./rz-deploy.sh

# Oder manuell:
docker compose -f rz-deployment.yml --env-file .env.rz up -d

3. Verify

# Health Check
curl http://localhost:8000/health

# Collections Check
curl http://localhost:6333/collections

# Login Test
curl -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@crumb.local","password":"admin123"}'

🔄 Standard Operations

Logs anzeigen

# Live Logs (alle Services)
docker compose -f rz-deployment.yml logs -f

# Nur Application
docker compose -f rz-deployment.yml logs -f app

# Letzte 100 Zeilen
docker compose -f rz-deployment.yml logs --tail=100 app

# Mit Zeitstempel
docker compose -f rz-deployment.yml logs -f -t app

# Nur Errors
docker compose -f rz-deployment.yml logs app | grep ERROR

Restart

# Einzelner Service
docker compose -f rz-deployment.yml restart app

# Alle Services
docker compose -f rz-deployment.yml restart

# Mit Rebuild (nach Code-Update)
docker compose -f rz-deployment.yml up -d --build app

Stop/Start

# Stoppen
docker compose -f rz-deployment.yml stop

# Starten
docker compose -f rz-deployment.yml start

# Down (Container entfernen, Volumes bleiben)
docker compose -f rz-deployment.yml down

# Down (inkl. Volumes - ⚠️ DATENVERLUST!)
docker compose -f rz-deployment.yml down -v

💾 Backup & Restore

Backup erstellen

#!/bin/bash
BACKUP_DIR="./backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"

# Database Backup
docker compose -f rz-deployment.yml exec -T db \
  sh -c 'mariadb-dump -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE' \
  > "$BACKUP_DIR/database.sql"

# Qdrant Backup (Volume)
docker run --rm \
  -v rz-crumbcore-qdrant-data:/data \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf /backup/qdrant-data.tar.gz -C /data .

# App Logs Backup
docker run --rm \
  -v rz-crumbcore-app-logs:/data \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf /backup/app-logs.tar.gz -C /data .

echo "✅ Backup erstellt: $BACKUP_DIR"

Restore

#!/bin/bash
BACKUP_DIR="./backups/20250103_120000"  # Anpassen!

# Database Restore
cat "$BACKUP_DIR/database.sql" | \
  docker compose -f rz-deployment.yml exec -T db \
  sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE'

# Qdrant Restore
docker run --rm \
  -v rz-crumbcore-qdrant-data:/data \
  -v "$BACKUP_DIR":/backup \
  alpine sh -c "cd /data && tar xzf /backup/qdrant-data.tar.gz"

# Restart Services
docker compose -f rz-deployment.yml restart

echo "✅ Restore abgeschlossen"

🔧 Maintenance Tasks

1. Update Deployment

# 1. Backup erstellen (siehe oben)
./backup-crumbcore.sh

# 2. Neue Version pullen
docker pull crumbcore:v1.1  # Oder neuere Version

# 3. Update in docker-compose.yml
nano rz-deployment.yml
# image: crumbcore:v1.1

# 4. Rolling Update
docker compose -f rz-deployment.yml up -d --no-deps app

# 5. Health Check
curl http://localhost:8000/health

2. Re-Index Dokumente

# Alle Dokumente neu indexieren
docker compose -f rz-deployment.yml exec app \
  python3 -c "
from scripts.index_docs import index_documents
index_documents('docs/rz-nullfeld', 'docs_rz_nullfeld_', force=True)
index_documents('docs/crumbforest', 'docs_crumbforest_', force=True)
print('✅ Re-Indexing abgeschlossen')
"

# Oder via API (mit Admin Login)
curl -X POST http://localhost:8000/api/documents/index \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{"provider": "openrouter", "force": true}'

3. Database Maintenance

# Optimize Tables
docker compose -f rz-deployment.yml exec db \
  mariadb -u root -p${DB_ROOT_PASSWORD} crumbforest \
  -e "OPTIMIZE TABLE users, sessions, diary_entries;"

# Check Table Status
docker compose -f rz-deployment.yml exec db \
  mariadb -u root -p${DB_ROOT_PASSWORD} crumbforest \
  -e "SHOW TABLE STATUS;"

# Vacuum (wenn InnoDB)
docker compose -f rz-deployment.yml exec db \
  mariadb -u root -p${DB_ROOT_PASSWORD} \
  -e "SET GLOBAL innodb_fast_shutdown=0;"

4. Log Rotation

# Logs älter als 30 Tage löschen
find /var/lib/docker/volumes/rz-crumbcore-app-logs/_data \
  -name "*.jsonl" -mtime +30 -delete

# Oder via Docker
docker run --rm \
  -v rz-crumbcore-app-logs:/logs \
  alpine find /logs -name "*.jsonl" -mtime +30 -delete

5. Cleanup

# Alte Images entfernen
docker image prune -a --filter "until=720h"

# Ungenutzte Volumes (⚠️ vorsichtig!)
docker volume prune

# Ungenutzte Networks
docker network prune

# System-weiter Cleanup
docker system prune -a --volumes

📊 Monitoring

Health Endpoints

# Application Health
curl http://localhost:8000/health
# → {"status": "healthy", "version": "1.0.0"}

# Qdrant Health
curl http://localhost:6333/health
# → OK

# Database Health
docker compose -f rz-deployment.yml exec db \
  sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD -e "SELECT 1"'

Metrics

# Qdrant Collections Status
curl -s http://localhost:6333/collections | jq .

# Container Stats
docker stats rz-crumbcore-app rz-crumbcore-db rz-crumbcore-qdrant

# Disk Usage
docker system df
docker volume ls -q | xargs docker volume inspect | \
  jq -r '.[] | "\(.Name): \(.Mountpoint)"' | \
  while read line; do
    name=$(echo $line | cut -d: -f1)
    path=$(echo $line | cut -d: -f2)
    size=$(du -sh "$path" 2>/dev/null | cut -f1)
    echo "$name: $size"
  done

Logs & Alerting

# Critical Errors (letzte Stunde)
docker compose -f rz-deployment.yml logs --since 1h app | \
  grep -i "error\|critical\|exception" | \
  tail -n 20

# Failed Login Attempts
docker compose -f rz-deployment.yml logs --since 1h app | \
  grep "Login failed" | wc -l

# Rate Limit Hits
docker compose -f rz-deployment.yml logs --since 1h app | \
  grep "429" | wc -l

# OpenRouter API Errors
docker compose -f rz-deployment.yml logs --since 1h app | \
  grep "OpenRouter" | grep -i error

🔥 Troubleshooting

Problem: Container startet nicht

# Logs checken
docker compose -f rz-deployment.yml logs app

# Häufige Ursachen:
# 1. Port belegt
lsof -i :8000

# 2. ENV Variables falsch
docker compose -f rz-deployment.yml config

# 3. Volume Permissions
docker run --rm -v rz-crumbcore-app-logs:/data alpine ls -la /data

Problem: Database Connection Failed

# 1. DB erreichbar?
docker compose -f rz-deployment.yml exec db \
  mariadb -u crumb -p -e "SELECT 1"

# 2. Passwort korrekt?
grep DB_PASSWORD .env.rz

# 3. Netzwerk ok?
docker network inspect rz-internal

# 4. DB Logs checken
docker compose -f rz-deployment.yml logs db | tail -n 50

Problem: Qdrant Fehler

# 1. Qdrant erreichbar?
curl http://localhost:6333/health

# 2. Collections vorhanden?
curl http://localhost:6333/collections

# 3. Storage Issues?
docker volume inspect rz-crumbcore-qdrant-data

# 4. Neu initialisieren (⚠️ Datenverlust!)
docker compose -f rz-deployment.yml down
docker volume rm rz-crumbcore-qdrant-data
docker compose -f rz-deployment.yml up -d
# Dann re-index!

Problem: Hohe CPU/RAM Usage

# Stats anzeigen
docker stats --no-stream

# Top Prozesse im Container
docker compose -f rz-deployment.yml exec app top

# Qdrant Memory Usage
curl http://localhost:6333/metrics

# Resource Limits setzen (docker-compose.yml):
# deploy:
#   resources:
#     limits:
#       cpus: '2.0'
#       memory: 1G

Problem: Slow Response Times

# 1. Response Time messen
time curl -s http://localhost:8000/health

# 2. Qdrant Query Performance
curl -s http://localhost:6333/metrics | grep query_time

# 3. Database Slow Queries
docker compose -f rz-deployment.yml exec db \
  mariadb -u root -p -e "SHOW FULL PROCESSLIST;"

# 4. App Logs auf Delays prüfen
docker compose -f rz-deployment.yml logs app | grep "took.*ms"

🔒 Security Checklist

Nach Deployment prüfen:

# 1. Firewall aktiv?
sudo ufw status
# Nur 8000 (intern) und 6333 (intern) offen

# 2. Secrets rotiert?
grep -i "change_me" .env.rz  # Sollte nichts finden!

# 3. Default Passwords geändert?
# Admin: admin@crumb.local / admin123 → ÄNDERN!

# 4. CORS korrekt?
curl -I http://localhost:8000/api/chat \
  -H "Origin: https://evil-site.com"
# Sollte CORS Error geben

# 5. Rate Limiting aktiv?
for i in {1..10}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST http://localhost:8000/api/chat \
    -H "Content-Type: application/json" \
    -d '{"character_id":"eule","question":"test","lang":"de"}'
done
# Nach 5 Requests: 429 erwarten

# 6. TLS (wenn öffentlich)?
curl -I https://docs.rz-nullfeld.de
# Sollte 200 + HSTS Header haben

📞 Incident Response

Severity Levels

P1 - Critical (Sofort):

  • System down
  • Data Loss
  • Security Breach

P2 - High (< 2h):

  • Performance Issues
  • Partial Outage
  • API Errors > 10%

P3 - Medium (< 8h):

  • Minor Bugs
  • Log Warnings
  • Single Feature broken

P4 - Low (Next Sprint):

  • Feature Requests
  • Documentation
  • Cosmetic Issues

P1 Response

# 1. Assess
curl http://localhost:8000/health
docker compose -f rz-deployment.yml ps

# 2. Quick Fix (Restart)
docker compose -f rz-deployment.yml restart

# 3. If still down, restore from backup
./restore-crumbcore.sh

# 4. Notify stakeholders
# 5. Post-mortem doc

📝 Change Management

Deployment Checklist

  • Backup erstellt
  • Rollback-Plan bereit
  • Stakeholders informiert
  • Maintenance Window geplant
  • Health Checks vorbereitet
  • Logs monitored (30 min nach Deploy)

Rollback Procedure

# 1. Stop new version
docker compose -f rz-deployment.yml down

# 2. Restore backup (siehe oben)
./restore-crumbcore.sh

# 3. Start old version
docker compose -f rz-deployment.yml up -d

# 4. Verify
curl http://localhost:8000/health

📚 Weitere Ressourcen

  • Security Audit: docs/security/audit_2025-12-03_chat_v1_security.md
  • Deployment Log: docs/security/DEPLOYMENT_SUCCESS_2025-12-03.md
  • Quickstart: QUICKSTART.md
  • Architecture: CLAUDE.md

Letzte Aktualisierung: 2025-12-04
Version: 1.0
Maintainer: RZ-Team

🌲 Stay safe im Crumbforest!