4.0 KiB
4.0 KiB
Fix: Empty Graph in Neo4j (No Relationships Found)
Problem
When querying Neo4j for CAUSES relationships, you get "(no changes, no records)" because:
- PDF extraction failed - Missing dependencies (
unstructured[pdf]) - 0 relations extracted - No text was extracted, so no analysis happened
- 0 relations written - Nothing was written to Neo4j (correct behavior)
Root Cause
The service completed with 0 relations because:
- PDF file extraction failed:
partition_pdf() is not available because one or more dependencies are not installed - No text was extracted from the PDF
- No chunks were created
- No Claude analysis happened
- 0 relations were extracted
- 0 relations were written to Neo4j
Solution
Step 1: Update Dependencies
The requirements.txt has been updated to include:
unstructured[pdf]>=0.15.0
unstructured[docx]>=0.15.0
unstructured[pptx]>=0.15.0
unstructured[xlsx]>=0.15.0
Step 2: Rebuild the Service
cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine
# Rebuild the service with new dependencies
docker-compose build multi-document-upload-service
# Restart the service
docker-compose restart multi-document-upload-service
# Check logs to verify it's working
docker-compose logs -f multi-document-upload-service
Step 3: Verify Dependencies
# Check if unstructured[pdf] is installed
docker-compose exec multi-document-upload-service pip list | grep unstructured
Step 4: Re-upload Documents
- Go to Project Builder in the frontend
- Click on "Upload Documents for Knowledge Graph"
- Upload a PDF or other document
- Wait for processing to complete
- Check Neo4j for relationships
Step 5: Check Neo4j
Run these queries in Neo4j Browser:
// Check if any nodes exist
MATCH (n)
RETURN count(n) as node_count
// Check for CAUSES relationships
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
RETURN n.name as cause, m.name as effect, r.confidence as confidence
LIMIT 50
Expected Behavior After Fix
- PDF extraction succeeds - Text is extracted from PDF files
- Text is chunked - Document is split into manageable chunks
- Claude analyzes - Causal relationships are extracted
- Relations are written - Relationships are stored in Neo4j
- Query returns results - Neo4j query shows relationships
Verification Steps
-
Check service logs:
docker-compose logs multi-document-upload-service | grep -i "extracted\|relation\|neo4j" -
Check job status:
curl http://localhost:8000/api/multi-docs/jobs/{job_id}Should show:
"processed_files": 1and relations count > 0 -
Check Neo4j:
MATCH (n:Concept)-[r:CAUSES]->(m:Concept) RETURN count(r) as relation_count
Improvements Made
- ✅ Added PDF dependencies -
unstructured[pdf],unstructured[docx], etc. - ✅ Added fallback extractors - Uses
pdfplumberif unstructured fails - ✅ Better error handling - Shows actual errors in job status
- ✅ Improved logging - More detailed logs for debugging
- ✅ Better Neo4j query - Validates data before writing
Troubleshooting
If you still see 0 relations after rebuilding:
-
Check extraction logs:
docker-compose logs multi-document-upload-service | grep -i "extract" -
Check Claude analysis:
docker-compose logs multi-document-upload-service | grep -i "claude\|analyze" -
Check Neo4j connection:
docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph" -
Verify document has causal language:
- Not all documents contain causal relationships
- Try uploading a document with clear cause-effect statements
- Example: "Smoking causes lung cancer" or "Rain causes flooding"
Next Steps
- Rebuild the service with new dependencies
- Re-upload documents
- Check Neo4j for relationships
- If still no results, check service logs for errors
- Verify the document contains causal language