kruemel/Crumb-Core-v.1

Fork 0

Files

Krümel Branko 6c38ed680b Initial commit: Crumbforest Architecture Refinement v1 (Clean)

2025-12-07 01:26:46 +01:00

10 KiB

Raw Blame History

🗄️ Qdrant Zugriff & Sicherheit

🔐 Sicherheits-Status

✅ Nach Fix (SICHER)

ports:
  - "127.0.0.1:6333:6333"  # Nur localhost

Zugriff:

✅ Lokal (auf Server): http://localhost:6333
✅ Via Docker Network: http://qdrant:6333
❌ Von außen: NICHT erreichbar (sicher!)

⚠️ Vorher (UNSICHER)

ports:
  - "6333:6333"  # Öffentlich!

🌐 Zugriffsmethoden

1. Lokal auf dem Server

# Dashboard öffnen (wenn auf Server)
open http://localhost:6333/dashboard

# Collections abfragen
curl http://localhost:6333/collections | jq

# Collection Details
curl http://localhost:6333/collections/docs_crumbforest_ | jq

2. Via Docker Network (FastAPI App)

# app/deps.py - Bereits implementiert
def get_qdrant_client():
    from qdrant_client import QdrantClient
    from config import get_settings
    
    settings = get_settings()
    # Nutzt Docker Network Name: "qdrant"
    return QdrantClient(
        host=settings.qdrant_host,  # "qdrant"
        port=settings.qdrant_port   # 6333
    )

Warum funktioniert das?

Container im gleichen Docker Network können sich per Name erreichen
qdrant wird zu interner IP aufgelöst
Keine externe Exposition nötig!

3. Remote Zugriff via SSH Tunnel

# Von deinem lokalen Rechner zum Server
ssh -L 6333:localhost:6333 user@your-server.com

# Jetzt lokal öffnen
open http://localhost:6333/dashboard

# Oder per API
curl http://localhost:6333/collections | jq

Erklärung:

-L 6333:localhost:6333 = Forward lokaler Port 6333 zu Server Port 6333
Sicher über SSH encrypted
Dashboard läuft "lokal" aber zeigt Server-Daten

4. Production Setup mit Nginx

# /etc/nginx/sites-available/qdrant
server {
    listen 443 ssl;
    server_name qdrant.crumbforest.de;

    ssl_certificate /etc/letsencrypt/live/qdrant.crumbforest.de/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/qdrant.crumbforest.de/privkey.pem;

    # Basic Auth
    auth_basic "Qdrant Admin";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:6333;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

# Basic Auth erstellen
sudo htpasswd -c /etc/nginx/.htpasswd admin

# Nginx neu laden
sudo nginx -t && sudo systemctl reload nginx

🔍 Markdown-Dateien durchsuchen

Methode 1: Via API (Empfohlen)

# Alle Dokumente durchsuchen
curl -X GET "http://localhost:8000/api/documents/search?q=docker&limit=10" \
  -H "Cookie: session=YOUR_SESSION_COOKIE"

# Nur Crumbforest Docs
curl -X GET "http://localhost:8000/api/documents/search?q=python&category=crumbforest&limit=5" \
  -H "Cookie: session=YOUR_SESSION_COOKIE"

# Session Cookie bekommen (nach Login)
# Im Browser: DevTools → Application → Cookies → session

Mit Python:

import requests

# Login
session = requests.Session()
response = session.post(
    "http://localhost:8000/de/login",
    data={
        "email": "admin@crumb.local",
        "password": "admin123",
        "csrf": "..."  # Von Login-Form
    }
)

# Suche
results = session.get(
    "http://localhost:8000/api/documents/search",
    params={"q": "docker", "limit": 10}
).json()

for result in results["results"]:
    print(f"{result['score']:.3f} - {result['title']}")
    print(f"  → {result['content'][:100]}...")

Methode 2: Direkt in Qdrant

# Collection Stats
curl http://localhost:6333/collections/docs_crumbforest_ | jq '.result | {
  status,
  points_count,
  indexed_vectors_count
}'

# Points durchsuchen (braucht Embedding!)
# Komplexer - besser via API

Methode 3: Database Query (Metadaten)

# Alle indexierten Dokumente
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
  SELECT 
    post_id,
    collection_name,
    JSON_EXTRACT(metadata, \"$.file_path\") as file_path,
    JSON_EXTRACT(metadata, \"$.category\") as category,
    chunk_count,
    indexed_at
  FROM post_vectors 
  WHERE post_type=\"document\"
  ORDER BY indexed_at DESC
  LIMIT 20;
"'

# Suche nach Dateiname
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
  SELECT 
    JSON_EXTRACT(metadata, \"$.file_path\") as file_path,
    chunk_count
  FROM post_vectors 
  WHERE post_type=\"document\"
    AND JSON_EXTRACT(metadata, \"$.file_path\") LIKE \"%docker%\"
  ORDER BY indexed_at DESC;
"'

Methode 4: Filesystem

# Alle .md Dateien finden
find docs/ -name "*.md" -type f

# Nach Inhalt suchen
grep -r "docker" docs/ --include="*.md"

# Mit Kontext
grep -r -C 3 "docker compose" docs/ --include="*.md"

# Case-insensitive
grep -ri "python" docs/ --include="*.md"

📝 Neue Version anmelden

Szenario: Du hast eine .md Datei aktualisiert

Option 1: Automatisch (Empfohlen)

# 1. Datei bearbeiten
nano docs/crumbforest/my_file.md

# 2. App neu starten (triggert Auto-Indexing)
cd compose
docker compose restart app

# 3. Logs prüfen
docker compose logs app | grep -A 10 "Document Indexing"

# Erwartete Ausgabe:
# ✓ Using provider: openrouter
# 📚 Indexing documents...
# 
# 📁 crumbforest:
#    Files found:    283
#    Indexed:        1      ← Nur geänderte Datei!
#    Unchanged:      282
#    Errors:         0

Wie funktioniert das?

File-Hash wird verglichen
Nur geänderte Dateien werden neu indexiert
Spart Zeit & API-Kosten!

Option 2: Manuell via API

# Alle Dokumente force re-indexen
curl -X POST "http://localhost:8000/api/documents/index" \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{
    "provider": "openrouter",
    "force": true
  }'

# Nur eine Kategorie
curl -X POST "http://localhost:8000/api/documents/index" \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{
    "category": "crumbforest",
    "provider": "openrouter",
    "force": true
  }'

Option 3: Einzelne Datei via Python

# manual_index.py
import sys
sys.path.insert(0, 'app')

from pathlib import Path
from deps import get_db, get_qdrant_client
from config import get_settings
from services.provider_factory import ProviderFactory
from services.document_indexer import DocumentIndexer

# Setup
settings = get_settings()
db_conn = get_db()
qdrant = get_qdrant_client()
provider = ProviderFactory.create_provider("openrouter", settings)

# Indexer
indexer = DocumentIndexer(db_conn, qdrant, provider, "docs")

# Einzelne Datei indexieren
file_path = Path("docs/crumbforest/my_updated_file.md")
result = indexer.index_document(file_path, "crumbforest", force=True)

print(f"Status: {result['status']}")
print(f"Chunks: {result.get('chunks', 0)}")

db_conn.close()

🔄 Update-Workflow

Code-Änderungen (Python)

# 1. Code bearbeiten
nano app/routers/my_feature.py

# 2. Nur App neu starten (schnell!)
docker compose restart app

# 3. Verifizieren
curl http://localhost:8000/health

Dependencies (requirements.txt)

# 1. requirements.txt bearbeiten
nano app/requirements.txt

# 2. Neu bauen
docker compose up --build -d

# 3. Verifizieren
docker compose exec app pip list | grep new-package

Docker-Compose Änderungen

# 1. docker-compose.yml bearbeiten
nano compose/docker-compose.yml

# 2. Services neu erstellen
docker compose up -d

# 3. Status prüfen
docker compose ps

Neue .md Dateien

# 1. Datei hinzufügen
cp new_doc.md docs/crumbforest/

# 2. App neu starten (triggert Auto-Indexing)
docker compose restart app

# 3. Logs prüfen
docker compose logs app | grep "Document Indexing"

# 4. Verifizieren in Qdrant
curl http://localhost:6333/collections/docs_crumbforest_ | \
  jq '.result.points_count'

Database Schema Änderungen

# 1. SQL Script erstellen
nano compose/init/99_my_migration.sql

# 2. Manuell ausführen (init/ läuft nur bei Erstellung!)
docker compose exec -T db sh -c \
  'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE' \
  < compose/init/99_my_migration.sql

# 3. Oder DB neu erstellen (⚠️ Löscht Daten!)
docker compose down -v
docker compose up -d

🛠️ Quick Reference

# Qdrant Dashboard (lokal)
open http://localhost:6333/dashboard

# Qdrant via SSH Tunnel
ssh -L 6333:localhost:6333 user@server

# Collections prüfen
curl http://localhost:6333/collections | jq '.result.collections[].name'

# Suche in Docs (braucht Session)
curl "http://localhost:8000/api/documents/search?q=docker" -H "Cookie: session=..."

# Status prüfen
curl http://localhost:8000/api/documents/status -H "Cookie: session=..."

# Force Re-Index
curl -X POST http://localhost:8000/api/documents/index \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{"force": true}'

# App neu starten (Auto-Indexing)
docker compose restart app

# Rebuild (bei Code/Dependency Changes)
docker compose up --build -d

# Logs live verfolgen
docker compose logs app -f

🔒 Production Checklist

Qdrant nur auf localhost: 127.0.0.1:6333:6333
Nginx Reverse Proxy mit SSL
Basic Auth für Qdrant Dashboard
Firewall Rules (nur Port 80/443 offen)
SSH Key-Based Auth (kein Password)
Environment Variables sicher speichern
Backup Cron Job einrichten
Monitoring (Uptime, Disk Space)
Log Rotation
Rate Limiting für API

💡 Tipps

Immer restart statt up wenn nur Code geändert

docker compose restart app  # Schnell
# statt
docker compose up --build   # Langsam

File-Hash-Tracking nutzen
- Nur geänderte Dateien werden neu indexiert
- Spart API-Kosten!
SSH Tunnel für Remote Admin
- Sicherer als VPN
- Keine Firewall-Änderungen nötig

Logs sind deine Freunde

# Errors finden
docker compose logs app | grep -i error

# Indexing Status
docker compose logs app | grep "Document Indexing" -A 20

Session Cookie im Browser
- DevTools → Application → Cookies
- Für API-Tests kopieren

Wuuuuhuuu! Qdrant ist jetzt sicher! 🦉

10 KiB Raw Blame History