codenuk_backend_mine/services/multi-document-upload-service/REBUILD_INSTRUCTIONS.md
2025-11-17 09:04:49 +05:30

3.6 KiB

Rebuild Instructions - Multi-Document Upload Service

Issue: Empty Graph in Neo4j

Problem: Query returns "(no changes, no records)" because the job completed with 0 relations.

Root Cause: PDF extraction failed due to missing dependencies (unstructured[pdf]).

Fixes Applied

  1. Added PDF dependencies (unstructured[pdf], unstructured[docx], etc.)
  2. Added fallback extractors (pdfplumber, python-docx, python-pptx)
  3. Improved error handling and logging
  4. Fixed Neo4j query syntax
  5. Better status messages

Rebuild Steps

Step 1: Rebuild the Service

cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine

# Stop the service
docker-compose stop multi-document-upload-service

# Rebuild with new dependencies
docker-compose build --no-cache multi-document-upload-service

# Start the service
docker-compose up -d multi-document-upload-service

# Check logs to verify it's starting correctly
docker-compose logs -f multi-document-upload-service

Step 2: Verify Dependencies

# Check if unstructured[pdf] is installed
docker-compose exec multi-document-upload-service pip list | grep unstructured

# You should see:
# unstructured
# unstructured-pdf
# unstructured-docx
# etc.

Step 3: Test the Service

# Check health endpoint
curl http://localhost:8024/health

# Should return:
# {
#   "status": "ok",
#   "claude_model": "claude-3-5-haiku-latest",
#   ...
# }

Step 4: Re-upload Documents

  1. Open frontend: http://localhost:3001/project-builder
  2. Go to Step 1: Project Type
  3. Find "Upload Documents for Knowledge Graph" section
  4. Upload a PDF or other document
  5. Wait for processing to complete
  6. Check status - should show relation count > 0

Step 5: Verify in Neo4j

Run these queries in Neo4j Browser (http://localhost:7474):

// Check if any nodes exist
MATCH (n)
RETURN count(n) as node_count

// Check for CAUSES relationships
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
RETURN n.name as cause, 
       m.name as effect, 
       r.confidence as confidence,
       r.job_id as job_id
LIMIT 50

Expected Results

After rebuilding and re-uploading:

  1. PDF extraction succeeds
  2. Text is extracted
  3. Relations are extracted
  4. Relations are written to Neo4j
  5. Query returns results

Troubleshooting

If you still see 0 relations:

  1. Check service logs:

    docker-compose logs multi-document-upload-service | tail -50
    
  2. Check extraction logs:

    docker-compose logs multi-document-upload-service | grep -i "extract\|pdf"
    
  3. Check Claude analysis:

    docker-compose logs multi-document-upload-service | grep -i "claude\|analyze\|relation"
    
  4. Check Neo4j connection:

    docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph\|write"
    
  5. Verify document has causal language:

    • Not all documents contain causal relationships
    • Try uploading a document with clear cause-effect statements
    • Example: "Smoking causes lung cancer"

Quick Test

Test with a simple text file:

  1. Create a test file test_causal.txt:

    Smoking cigarettes causes lung cancer.
    Heavy rain causes flooding.
    Exercise improves health.
    
  2. Upload it via the frontend

  3. Check Neo4j for relationships

  4. Should see 3 causal relationships

Next Steps

  1. Rebuild the service
  2. Re-upload documents
  3. Check Neo4j for relationships
  4. If still no results, check service logs
  5. Verify the document contains causal language