codenuk_backend_mine/services/multi-document-upload-service/FIX_EMPTY_GRAPH.md
2025-11-17 09:04:49 +05:30

145 lines
4.0 KiB
Markdown

# Fix: Empty Graph in Neo4j (No Relationships Found)
## Problem
When querying Neo4j for `CAUSES` relationships, you get "(no changes, no records)" because:
1. **PDF extraction failed** - Missing dependencies (`unstructured[pdf]`)
2. **0 relations extracted** - No text was extracted, so no analysis happened
3. **0 relations written** - Nothing was written to Neo4j (correct behavior)
## Root Cause
The service completed with 0 relations because:
- PDF file extraction failed: `partition_pdf() is not available because one or more dependencies are not installed`
- No text was extracted from the PDF
- No chunks were created
- No Claude analysis happened
- 0 relations were extracted
- 0 relations were written to Neo4j
## Solution
### Step 1: Update Dependencies
The `requirements.txt` has been updated to include:
```
unstructured[pdf]>=0.15.0
unstructured[docx]>=0.15.0
unstructured[pptx]>=0.15.0
unstructured[xlsx]>=0.15.0
```
### Step 2: Rebuild the Service
```bash
cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine
# Rebuild the service with new dependencies
docker-compose build multi-document-upload-service
# Restart the service
docker-compose restart multi-document-upload-service
# Check logs to verify it's working
docker-compose logs -f multi-document-upload-service
```
### Step 3: Verify Dependencies
```bash
# Check if unstructured[pdf] is installed
docker-compose exec multi-document-upload-service pip list | grep unstructured
```
### Step 4: Re-upload Documents
1. Go to Project Builder in the frontend
2. Click on "Upload Documents for Knowledge Graph"
3. Upload a PDF or other document
4. Wait for processing to complete
5. Check Neo4j for relationships
### Step 5: Check Neo4j
Run these queries in Neo4j Browser:
```cypher
// Check if any nodes exist
MATCH (n)
RETURN count(n) as node_count
// Check for CAUSES relationships
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
RETURN n.name as cause, m.name as effect, r.confidence as confidence
LIMIT 50
```
## Expected Behavior After Fix
1. **PDF extraction succeeds** - Text is extracted from PDF files
2. **Text is chunked** - Document is split into manageable chunks
3. **Claude analyzes** - Causal relationships are extracted
4. **Relations are written** - Relationships are stored in Neo4j
5. **Query returns results** - Neo4j query shows relationships
## Verification Steps
1. **Check service logs**:
```bash
docker-compose logs multi-document-upload-service | grep -i "extracted\|relation\|neo4j"
```
2. **Check job status**:
```bash
curl http://localhost:8000/api/multi-docs/jobs/{job_id}
```
Should show: `"processed_files": 1` and relations count > 0
3. **Check Neo4j**:
```cypher
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
RETURN count(r) as relation_count
```
## Improvements Made
1.**Added PDF dependencies** - `unstructured[pdf]`, `unstructured[docx]`, etc.
2.**Added fallback extractors** - Uses `pdfplumber` if unstructured fails
3.**Better error handling** - Shows actual errors in job status
4.**Improved logging** - More detailed logs for debugging
5.**Better Neo4j query** - Validates data before writing
## Troubleshooting
If you still see 0 relations after rebuilding:
1. **Check extraction logs**:
```bash
docker-compose logs multi-document-upload-service | grep -i "extract"
```
2. **Check Claude analysis**:
```bash
docker-compose logs multi-document-upload-service | grep -i "claude\|analyze"
```
3. **Check Neo4j connection**:
```bash
docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph"
```
4. **Verify document has causal language**:
- Not all documents contain causal relationships
- Try uploading a document with clear cause-effect statements
- Example: "Smoking causes lung cancer" or "Rain causes flooding"
## Next Steps
1. Rebuild the service with new dependencies
2. Re-upload documents
3. Check Neo4j for relationships
4. If still no results, check service logs for errors
5. Verify the document contains causal language