codenuk_backend_mine/services/multi-document-upload-service
2025-11-17 09:04:49 +05:30
..
src/multi_document_upload_service newly added multi doc upload service 2025-11-17 09:04:49 +05:30
Dockerfile newly added multi doc upload service 2025-11-17 09:04:49 +05:30
FIX_EMPTY_GRAPH.md newly added multi doc upload service 2025-11-17 09:04:49 +05:30
NEO4J_DIAGNOSTIC_QUERIES.md newly added multi doc upload service 2025-11-17 09:04:49 +05:30
QUICK_TEST.md newly added multi doc upload service 2025-11-17 09:04:49 +05:30
README.md newly added multi doc upload service 2025-11-17 09:04:49 +05:30
REBUILD_INSTRUCTIONS.md newly added multi doc upload service 2025-11-17 09:04:49 +05:30
requirements.txt newly added multi doc upload service 2025-11-17 09:04:49 +05:30
TESTING_GUIDE.md newly added multi doc upload service 2025-11-17 09:04:49 +05:30

Multi Document Upload Service

This service accepts large batches of heterogeneous documents, extracts causal relationships with Claude Sonnet 3.5, and writes them into Neo4j as a knowledge graph.

Features

  • Multipart upload endpoint (POST /jobs) capable of handling dozens of files and mixed formats (PDF, DOCX, PPTX, XLSX/CSV, JSON/XML, images, audio/video).
  • Content extraction powered by the unstructured library with fallbacks.
  • Chunking tuned for Claude Sonnet (800 token target, 200 overlap).
  • High-accuracy causal extraction using Anthropic Claude with provenance.
  • Neo4j graph writer that upserts Concept nodes and CAUSES edges.
  • Status endpoint (GET /jobs/{id}) and graph summary endpoint (GET /jobs/{id}/graph).

Configuration

Environment variables:

  • ANTHROPIC_API_KEY (required)
  • MULTI_DOC_CLAUDE_MODEL (default claude-3-5-sonnet-20241022)
  • NEO4J_URI (default bolt://localhost:7687)
  • NEO4J_USER / NEO4J_PASSWORD (default neo4j / neo4j)
  • MULTI_DOC_STORAGE_ROOT (default storage inside project)

Run locally

uvicorn multi_document_upload_service.main:app --reload --host 0.0.0.0 --port 8035

Ensure Neo4j is reachable and Anthropic credentials are exported before starting the service.