codenuk_backend_mine/services/multi-document-upload-service/README.md
2025-11-17 09:04:49 +05:30

1.2 KiB

Multi Document Upload Service

This service accepts large batches of heterogeneous documents, extracts causal relationships with Claude Sonnet 3.5, and writes them into Neo4j as a knowledge graph.

Features

  • Multipart upload endpoint (POST /jobs) capable of handling dozens of files and mixed formats (PDF, DOCX, PPTX, XLSX/CSV, JSON/XML, images, audio/video).
  • Content extraction powered by the unstructured library with fallbacks.
  • Chunking tuned for Claude Sonnet (800 token target, 200 overlap).
  • High-accuracy causal extraction using Anthropic Claude with provenance.
  • Neo4j graph writer that upserts Concept nodes and CAUSES edges.
  • Status endpoint (GET /jobs/{id}) and graph summary endpoint (GET /jobs/{id}/graph).

Configuration

Environment variables:

  • ANTHROPIC_API_KEY (required)
  • MULTI_DOC_CLAUDE_MODEL (default claude-3-5-sonnet-20241022)
  • NEO4J_URI (default bolt://localhost:7687)
  • NEO4J_USER / NEO4J_PASSWORD (default neo4j / neo4j)
  • MULTI_DOC_STORAGE_ROOT (default storage inside project)

Run locally

uvicorn multi_document_upload_service.main:app --reload --host 0.0.0.0 --port 8035

Ensure Neo4j is reachable and Anthropic credentials are exported before starting the service.