codenuk_backend_mine/services/multi-document-upload-service/FIX_EMPTY_GRAPH.md
2025-11-17 09:04:49 +05:30

4.0 KiB

Fix: Empty Graph in Neo4j (No Relationships Found)

Problem

When querying Neo4j for CAUSES relationships, you get "(no changes, no records)" because:

  1. PDF extraction failed - Missing dependencies (unstructured[pdf])
  2. 0 relations extracted - No text was extracted, so no analysis happened
  3. 0 relations written - Nothing was written to Neo4j (correct behavior)

Root Cause

The service completed with 0 relations because:

  • PDF file extraction failed: partition_pdf() is not available because one or more dependencies are not installed
  • No text was extracted from the PDF
  • No chunks were created
  • No Claude analysis happened
  • 0 relations were extracted
  • 0 relations were written to Neo4j

Solution

Step 1: Update Dependencies

The requirements.txt has been updated to include:

unstructured[pdf]>=0.15.0
unstructured[docx]>=0.15.0
unstructured[pptx]>=0.15.0
unstructured[xlsx]>=0.15.0

Step 2: Rebuild the Service

cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine

# Rebuild the service with new dependencies
docker-compose build multi-document-upload-service

# Restart the service
docker-compose restart multi-document-upload-service

# Check logs to verify it's working
docker-compose logs -f multi-document-upload-service

Step 3: Verify Dependencies

# Check if unstructured[pdf] is installed
docker-compose exec multi-document-upload-service pip list | grep unstructured

Step 4: Re-upload Documents

  1. Go to Project Builder in the frontend
  2. Click on "Upload Documents for Knowledge Graph"
  3. Upload a PDF or other document
  4. Wait for processing to complete
  5. Check Neo4j for relationships

Step 5: Check Neo4j

Run these queries in Neo4j Browser:

// Check if any nodes exist
MATCH (n)
RETURN count(n) as node_count

// Check for CAUSES relationships
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
RETURN n.name as cause, m.name as effect, r.confidence as confidence
LIMIT 50

Expected Behavior After Fix

  1. PDF extraction succeeds - Text is extracted from PDF files
  2. Text is chunked - Document is split into manageable chunks
  3. Claude analyzes - Causal relationships are extracted
  4. Relations are written - Relationships are stored in Neo4j
  5. Query returns results - Neo4j query shows relationships

Verification Steps

  1. Check service logs:

    docker-compose logs multi-document-upload-service | grep -i "extracted\|relation\|neo4j"
    
  2. Check job status:

    curl http://localhost:8000/api/multi-docs/jobs/{job_id}
    

    Should show: "processed_files": 1 and relations count > 0

  3. Check Neo4j:

    MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
    RETURN count(r) as relation_count
    

Improvements Made

  1. Added PDF dependencies - unstructured[pdf], unstructured[docx], etc.
  2. Added fallback extractors - Uses pdfplumber if unstructured fails
  3. Better error handling - Shows actual errors in job status
  4. Improved logging - More detailed logs for debugging
  5. Better Neo4j query - Validates data before writing

Troubleshooting

If you still see 0 relations after rebuilding:

  1. Check extraction logs:

    docker-compose logs multi-document-upload-service | grep -i "extract"
    
  2. Check Claude analysis:

    docker-compose logs multi-document-upload-service | grep -i "claude\|analyze"
    
  3. Check Neo4j connection:

    docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph"
    
  4. Verify document has causal language:

    • Not all documents contain causal relationships
    • Try uploading a document with clear cause-effect statements
    • Example: "Smoking causes lung cancer" or "Rain causes flooding"

Next Steps

  1. Rebuild the service with new dependencies
  2. Re-upload documents
  3. Check Neo4j for relationships
  4. If still no results, check service logs for errors
  5. Verify the document contains causal language