37 lines
1.2 KiB
Markdown
37 lines
1.2 KiB
Markdown
# Multi Document Upload Service
|
|
|
|
This service accepts large batches of heterogeneous documents, extracts causal
|
|
relationships with Claude Sonnet 3.5, and writes them into Neo4j as a
|
|
knowledge graph.
|
|
|
|
## Features
|
|
|
|
- Multipart upload endpoint (`POST /jobs`) capable of handling dozens of files
|
|
and mixed formats (PDF, DOCX, PPTX, XLSX/CSV, JSON/XML, images, audio/video).
|
|
- Content extraction powered by the `unstructured` library with fallbacks.
|
|
- Chunking tuned for Claude Sonnet (800 token target, 200 overlap).
|
|
- High-accuracy causal extraction using Anthropic Claude with provenance.
|
|
- Neo4j graph writer that upserts `Concept` nodes and `CAUSES` edges.
|
|
- Status endpoint (`GET /jobs/{id}`) and graph summary endpoint
|
|
(`GET /jobs/{id}/graph`).
|
|
|
|
## Configuration
|
|
|
|
Environment variables:
|
|
|
|
- `ANTHROPIC_API_KEY` (required)
|
|
- `MULTI_DOC_CLAUDE_MODEL` (default `claude-3-5-sonnet-20241022`)
|
|
- `NEO4J_URI` (default `bolt://localhost:7687`)
|
|
- `NEO4J_USER` / `NEO4J_PASSWORD` (default `neo4j` / `neo4j`)
|
|
- `MULTI_DOC_STORAGE_ROOT` (default `storage` inside project)
|
|
|
|
## Run locally
|
|
|
|
```bash
|
|
uvicorn multi_document_upload_service.main:app --reload --host 0.0.0.0 --port 8035
|
|
```
|
|
|
|
Ensure Neo4j is reachable and Anthropic credentials are exported before
|
|
starting the service.
|
|
|