| .. | ||
| src/multi_document_upload_service | ||
| Dockerfile | ||
| FIX_EMPTY_GRAPH.md | ||
| NEO4J_DIAGNOSTIC_QUERIES.md | ||
| QUICK_TEST.md | ||
| README.md | ||
| REBUILD_INSTRUCTIONS.md | ||
| requirements.txt | ||
| TESTING_GUIDE.md | ||
Multi Document Upload Service
This service accepts large batches of heterogeneous documents, extracts causal relationships with Claude Sonnet 3.5, and writes them into Neo4j as a knowledge graph.
Features
- Multipart upload endpoint (
POST /jobs) capable of handling dozens of files and mixed formats (PDF, DOCX, PPTX, XLSX/CSV, JSON/XML, images, audio/video). - Content extraction powered by the
unstructuredlibrary with fallbacks. - Chunking tuned for Claude Sonnet (800 token target, 200 overlap).
- High-accuracy causal extraction using Anthropic Claude with provenance.
- Neo4j graph writer that upserts
Conceptnodes andCAUSESedges. - Status endpoint (
GET /jobs/{id}) and graph summary endpoint (GET /jobs/{id}/graph).
Configuration
Environment variables:
ANTHROPIC_API_KEY(required)MULTI_DOC_CLAUDE_MODEL(defaultclaude-3-5-sonnet-20241022)NEO4J_URI(defaultbolt://localhost:7687)NEO4J_USER/NEO4J_PASSWORD(defaultneo4j/neo4j)MULTI_DOC_STORAGE_ROOT(defaultstorageinside project)
Run locally
uvicorn multi_document_upload_service.main:app --reload --host 0.0.0.0 --port 8035
Ensure Neo4j is reachable and Anthropic credentials are exported before starting the service.