codenuk_backend_mine/services/multi-document-upload-service/README.md
2025-11-17 09:04:49 +05:30

37 lines
1.2 KiB
Markdown

# Multi Document Upload Service
This service accepts large batches of heterogeneous documents, extracts causal
relationships with Claude Sonnet 3.5, and writes them into Neo4j as a
knowledge graph.
## Features
- Multipart upload endpoint (`POST /jobs`) capable of handling dozens of files
and mixed formats (PDF, DOCX, PPTX, XLSX/CSV, JSON/XML, images, audio/video).
- Content extraction powered by the `unstructured` library with fallbacks.
- Chunking tuned for Claude Sonnet (800 token target, 200 overlap).
- High-accuracy causal extraction using Anthropic Claude with provenance.
- Neo4j graph writer that upserts `Concept` nodes and `CAUSES` edges.
- Status endpoint (`GET /jobs/{id}`) and graph summary endpoint
(`GET /jobs/{id}/graph`).
## Configuration
Environment variables:
- `ANTHROPIC_API_KEY` (required)
- `MULTI_DOC_CLAUDE_MODEL` (default `claude-3-5-sonnet-20241022`)
- `NEO4J_URI` (default `bolt://localhost:7687`)
- `NEO4J_USER` / `NEO4J_PASSWORD` (default `neo4j` / `neo4j`)
- `MULTI_DOC_STORAGE_ROOT` (default `storage` inside project)
## Run locally
```bash
uvicorn multi_document_upload_service.main:app --reload --host 0.0.0.0 --port 8035
```
Ensure Neo4j is reachable and Anthropic credentials are exported before
starting the service.