Crumb-Core-v.1/QDRANT_ACCESS.md

# 🗄️ Qdrant Zugriff & Sicherheit

## 🔐 Sicherheits-Status

### ✅ Nach Fix (SICHER)
```yaml
ports:
  - "127.0.0.1:6333:6333"  # Nur localhost
```

**Zugriff:**
- ✅ Lokal (auf Server): `http://localhost:6333`
- ✅ Via Docker Network: `http://qdrant:6333`
- ❌ Von außen: NICHT erreichbar (sicher!)

### ⚠️ Vorher (UNSICHER)
```yaml
ports:
  - "6333:6333"  # Öffentlich!
```

## 🌐 Zugriffsmethoden

### 1. Lokal auf dem Server

```bash
# Dashboard öffnen (wenn auf Server)
open http://localhost:6333/dashboard

# Collections abfragen
curl http://localhost:6333/collections | jq

# Collection Details
curl http://localhost:6333/collections/docs_crumbforest_ | jq
```

### 2. Via Docker Network (FastAPI App)

```python
# app/deps.py - Bereits implementiert
def get_qdrant_client():
    from qdrant_client import QdrantClient
    from config import get_settings

    settings = get_settings()
    # Nutzt Docker Network Name: "qdrant"
    return QdrantClient(
        host=settings.qdrant_host,  # "qdrant"
        port=settings.qdrant_port   # 6333
    )
```

**Warum funktioniert das?**
- Container im gleichen Docker Network können sich per Name erreichen
- `qdrant` wird zu interner IP aufgelöst
- Keine externe Exposition nötig!

### 3. Remote Zugriff via SSH Tunnel

```bash
# Von deinem lokalen Rechner zum Server
ssh -L 6333:localhost:6333 user@your-server.com

# Jetzt lokal öffnen
open http://localhost:6333/dashboard

# Oder per API
curl http://localhost:6333/collections | jq
```

**Erklärung:**
- `-L 6333:localhost:6333` = Forward lokaler Port 6333 zu Server Port 6333
- Sicher über SSH encrypted
- Dashboard läuft "lokal" aber zeigt Server-Daten

### 4. Production Setup mit Nginx

```nginx
# /etc/nginx/sites-available/qdrant
server {
    listen 443 ssl;
    server_name qdrant.crumbforest.de;

    ssl_certificate /etc/letsencrypt/live/qdrant.crumbforest.de/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/qdrant.crumbforest.de/privkey.pem;

    # Basic Auth
    auth_basic "Qdrant Admin";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:6333;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
```

```bash
# Basic Auth erstellen
sudo htpasswd -c /etc/nginx/.htpasswd admin

# Nginx neu laden
sudo nginx -t && sudo systemctl reload nginx
```

## 🔍 Markdown-Dateien durchsuchen

### Methode 1: Via API (Empfohlen)

```bash
# Alle Dokumente durchsuchen
curl -X GET "http://localhost:8000/api/documents/search?q=docker&limit=10" \
  -H "Cookie: session=YOUR_SESSION_COOKIE"

# Nur Crumbforest Docs
curl -X GET "http://localhost:8000/api/documents/search?q=python&category=crumbforest&limit=5" \
  -H "Cookie: session=YOUR_SESSION_COOKIE"

# Session Cookie bekommen (nach Login)
# Im Browser: DevTools → Application → Cookies → session
```

**Mit Python:**
```python
import requests

# Login
session = requests.Session()
response = session.post(
    "http://localhost:8000/de/login",
    data={
        "email": "admin@crumb.local",
        "password": "admin123",
        "csrf": "..."  # Von Login-Form
    }
)

# Suche
results = session.get(
    "http://localhost:8000/api/documents/search",
    params={"q": "docker", "limit": 10}
).json()

for result in results["results"]:
    print(f"{result['score']:.3f} - {result['title']}")
    print(f"  → {result['content'][:100]}...")
```

### Methode 2: Direkt in Qdrant

```bash
# Collection Stats
curl http://localhost:6333/collections/docs_crumbforest_ | jq '.result | {
  status,
  points_count,
  indexed_vectors_count
}'

# Points durchsuchen (braucht Embedding!)
# Komplexer - besser via API
```

### Methode 3: Database Query (Metadaten)

```bash
# Alle indexierten Dokumente
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
  SELECT
    post_id,
    collection_name,
    JSON_EXTRACT(metadata, \"$.file_path\") as file_path,
    JSON_EXTRACT(metadata, \"$.category\") as category,
    chunk_count,
    indexed_at
  FROM post_vectors
  WHERE post_type=\"document\"
  ORDER BY indexed_at DESC
  LIMIT 20;
"'

# Suche nach Dateiname
docker compose exec -T db sh -c 'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE -e "
  SELECT
    JSON_EXTRACT(metadata, \"$.file_path\") as file_path,
    chunk_count
  FROM post_vectors
  WHERE post_type=\"document\"
    AND JSON_EXTRACT(metadata, \"$.file_path\") LIKE \"%docker%\"
  ORDER BY indexed_at DESC;
"'
```

### Methode 4: Filesystem

```bash
# Alle .md Dateien finden
find docs/ -name "*.md" -type f

# Nach Inhalt suchen
grep -r "docker" docs/ --include="*.md"

# Mit Kontext
grep -r -C 3 "docker compose" docs/ --include="*.md"

# Case-insensitive
grep -ri "python" docs/ --include="*.md"
```

## 📝 Neue Version anmelden

### Szenario: Du hast eine .md Datei aktualisiert

#### Option 1: Automatisch (Empfohlen)

```bash
# 1. Datei bearbeiten
nano docs/crumbforest/my_file.md

# 2. App neu starten (triggert Auto-Indexing)
cd compose
docker compose restart app

# 3. Logs prüfen
docker compose logs app | grep -A 10 "Document Indexing"

# Erwartete Ausgabe:
# ✓ Using provider: openrouter
# 📚 Indexing documents...
#
# 📁 crumbforest:
#    Files found:    283
#    Indexed:        1      ← Nur geänderte Datei!
#    Unchanged:      282
#    Errors:         0
```

**Wie funktioniert das?**
- File-Hash wird verglichen
- Nur geänderte Dateien werden neu indexiert
- Spart Zeit & API-Kosten!

#### Option 2: Manuell via API

```bash
# Alle Dokumente force re-indexen
curl -X POST "http://localhost:8000/api/documents/index" \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{
    "provider": "openrouter",
    "force": true
  }'

# Nur eine Kategorie
curl -X POST "http://localhost:8000/api/documents/index" \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{
    "category": "crumbforest",
    "provider": "openrouter",
    "force": true
  }'
```

#### Option 3: Einzelne Datei via Python

```python
# manual_index.py
import sys
sys.path.insert(0, 'app')

from pathlib import Path
from deps import get_db, get_qdrant_client
from config import get_settings
from services.provider_factory import ProviderFactory
from services.document_indexer import DocumentIndexer

# Setup
settings = get_settings()
db_conn = get_db()
qdrant = get_qdrant_client()
provider = ProviderFactory.create_provider("openrouter", settings)

# Indexer
indexer = DocumentIndexer(db_conn, qdrant, provider, "docs")

# Einzelne Datei indexieren
file_path = Path("docs/crumbforest/my_updated_file.md")
result = indexer.index_document(file_path, "crumbforest", force=True)

print(f"Status: {result['status']}")
print(f"Chunks: {result.get('chunks', 0)}")

db_conn.close()
```

## 🔄 Update-Workflow

### Code-Änderungen (Python)

```bash
# 1. Code bearbeiten
nano app/routers/my_feature.py

# 2. Nur App neu starten (schnell!)
docker compose restart app

# 3. Verifizieren
curl http://localhost:8000/health
```

### Dependencies (requirements.txt)

```bash
# 1. requirements.txt bearbeiten
nano app/requirements.txt

# 2. Neu bauen
docker compose up --build -d

# 3. Verifizieren
docker compose exec app pip list | grep new-package
```

### Docker-Compose Änderungen

```bash
# 1. docker-compose.yml bearbeiten
nano compose/docker-compose.yml

# 2. Services neu erstellen
docker compose up -d

# 3. Status prüfen
docker compose ps
```

### Neue .md Dateien

```bash
# 1. Datei hinzufügen
cp new_doc.md docs/crumbforest/

# 2. App neu starten (triggert Auto-Indexing)
docker compose restart app

# 3. Logs prüfen
docker compose logs app | grep "Document Indexing"

# 4. Verifizieren in Qdrant
curl http://localhost:6333/collections/docs_crumbforest_ | \
  jq '.result.points_count'
```

### Database Schema Änderungen

```bash
# 1. SQL Script erstellen
nano compose/init/99_my_migration.sql

# 2. Manuell ausführen (init/ läuft nur bei Erstellung!)
docker compose exec -T db sh -c \
  'mariadb -u$MARIADB_USER -p$MARIADB_PASSWORD $MARIADB_DATABASE' \
  < compose/init/99_my_migration.sql

# 3. Oder DB neu erstellen (⚠️ Löscht Daten!)
docker compose down -v
docker compose up -d
```

## 🛠️ Quick Reference

```bash
# Qdrant Dashboard (lokal)
open http://localhost:6333/dashboard

# Qdrant via SSH Tunnel
ssh -L 6333:localhost:6333 user@server

# Collections prüfen
curl http://localhost:6333/collections | jq '.result.collections[].name'

# Suche in Docs (braucht Session)
curl "http://localhost:8000/api/documents/search?q=docker" -H "Cookie: session=..."

# Status prüfen
curl http://localhost:8000/api/documents/status -H "Cookie: session=..."

# Force Re-Index
curl -X POST http://localhost:8000/api/documents/index \
  -H "Content-Type: application/json" \
  -H "Cookie: session=..." \
  -d '{"force": true}'

# App neu starten (Auto-Indexing)
docker compose restart app

# Rebuild (bei Code/Dependency Changes)
docker compose up --build -d

# Logs live verfolgen
docker compose logs app -f
```

## 🔒 Production Checklist

- [x] Qdrant nur auf localhost: `127.0.0.1:6333:6333`
- [ ] Nginx Reverse Proxy mit SSL
- [ ] Basic Auth für Qdrant Dashboard
- [ ] Firewall Rules (nur Port 80/443 offen)
- [ ] SSH Key-Based Auth (kein Password)
- [ ] Environment Variables sicher speichern
- [ ] Backup Cron Job einrichten
- [ ] Monitoring (Uptime, Disk Space)
- [ ] Log Rotation
- [ ] Rate Limiting für API

## 💡 Tipps

1. **Immer `restart` statt `up` wenn nur Code geändert**
   ```bash
   docker compose restart app  # Schnell
   # statt
   docker compose up --build   # Langsam
   ```

2. **File-Hash-Tracking nutzen**
   - Nur geänderte Dateien werden neu indexiert
   - Spart API-Kosten!

3. **SSH Tunnel für Remote Admin**
   - Sicherer als VPN
   - Keine Firewall-Änderungen nötig

4. **Logs sind deine Freunde**
   ```bash
   # Errors finden
   docker compose logs app | grep -i error

   # Indexing Status
   docker compose logs app | grep "Document Indexing" -A 20
   ```

5. **Session Cookie im Browser**
   - DevTools → Application → Cookies
   - Für API-Tests kopieren

---

**Wuuuuhuuu! Qdrant ist jetzt sicher! 🦉**