# Multi Document Upload Service This service accepts large batches of heterogeneous documents, extracts causal relationships with Claude Sonnet 3.5, and writes them into Neo4j as a knowledge graph. ## Features - Multipart upload endpoint (`POST /jobs`) capable of handling dozens of files and mixed formats (PDF, DOCX, PPTX, XLSX/CSV, JSON/XML, images, audio/video). - Content extraction powered by the `unstructured` library with fallbacks. - Chunking tuned for Claude Sonnet (800 token target, 200 overlap). - High-accuracy causal extraction using Anthropic Claude with provenance. - Neo4j graph writer that upserts `Concept` nodes and `CAUSES` edges. - Status endpoint (`GET /jobs/{id}`) and graph summary endpoint (`GET /jobs/{id}/graph`). ## Configuration Environment variables: - `ANTHROPIC_API_KEY` (required) - `MULTI_DOC_CLAUDE_MODEL` (default `claude-3-5-sonnet-20241022`) - `NEO4J_URI` (default `bolt://localhost:7687`) - `NEO4J_USER` / `NEO4J_PASSWORD` (default `neo4j` / `neo4j`) - `MULTI_DOC_STORAGE_ROOT` (default `storage` inside project) ## Run locally ```bash uvicorn multi_document_upload_service.main:app --reload --host 0.0.0.0 --port 8035 ``` Ensure Neo4j is reachable and Anthropic credentials are exported before starting the service.