145 lines
4.0 KiB
Markdown
145 lines
4.0 KiB
Markdown
# Fix: Empty Graph in Neo4j (No Relationships Found)
|
|
|
|
## Problem
|
|
|
|
When querying Neo4j for `CAUSES` relationships, you get "(no changes, no records)" because:
|
|
|
|
1. **PDF extraction failed** - Missing dependencies (`unstructured[pdf]`)
|
|
2. **0 relations extracted** - No text was extracted, so no analysis happened
|
|
3. **0 relations written** - Nothing was written to Neo4j (correct behavior)
|
|
|
|
## Root Cause
|
|
|
|
The service completed with 0 relations because:
|
|
- PDF file extraction failed: `partition_pdf() is not available because one or more dependencies are not installed`
|
|
- No text was extracted from the PDF
|
|
- No chunks were created
|
|
- No Claude analysis happened
|
|
- 0 relations were extracted
|
|
- 0 relations were written to Neo4j
|
|
|
|
## Solution
|
|
|
|
### Step 1: Update Dependencies
|
|
|
|
The `requirements.txt` has been updated to include:
|
|
```
|
|
unstructured[pdf]>=0.15.0
|
|
unstructured[docx]>=0.15.0
|
|
unstructured[pptx]>=0.15.0
|
|
unstructured[xlsx]>=0.15.0
|
|
```
|
|
|
|
### Step 2: Rebuild the Service
|
|
|
|
```bash
|
|
cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine
|
|
|
|
# Rebuild the service with new dependencies
|
|
docker-compose build multi-document-upload-service
|
|
|
|
# Restart the service
|
|
docker-compose restart multi-document-upload-service
|
|
|
|
# Check logs to verify it's working
|
|
docker-compose logs -f multi-document-upload-service
|
|
```
|
|
|
|
### Step 3: Verify Dependencies
|
|
|
|
```bash
|
|
# Check if unstructured[pdf] is installed
|
|
docker-compose exec multi-document-upload-service pip list | grep unstructured
|
|
```
|
|
|
|
### Step 4: Re-upload Documents
|
|
|
|
1. Go to Project Builder in the frontend
|
|
2. Click on "Upload Documents for Knowledge Graph"
|
|
3. Upload a PDF or other document
|
|
4. Wait for processing to complete
|
|
5. Check Neo4j for relationships
|
|
|
|
### Step 5: Check Neo4j
|
|
|
|
Run these queries in Neo4j Browser:
|
|
|
|
```cypher
|
|
// Check if any nodes exist
|
|
MATCH (n)
|
|
RETURN count(n) as node_count
|
|
|
|
// Check for CAUSES relationships
|
|
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
|
|
RETURN n.name as cause, m.name as effect, r.confidence as confidence
|
|
LIMIT 50
|
|
```
|
|
|
|
## Expected Behavior After Fix
|
|
|
|
1. **PDF extraction succeeds** - Text is extracted from PDF files
|
|
2. **Text is chunked** - Document is split into manageable chunks
|
|
3. **Claude analyzes** - Causal relationships are extracted
|
|
4. **Relations are written** - Relationships are stored in Neo4j
|
|
5. **Query returns results** - Neo4j query shows relationships
|
|
|
|
## Verification Steps
|
|
|
|
1. **Check service logs**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "extracted\|relation\|neo4j"
|
|
```
|
|
|
|
2. **Check job status**:
|
|
```bash
|
|
curl http://localhost:8000/api/multi-docs/jobs/{job_id}
|
|
```
|
|
Should show: `"processed_files": 1` and relations count > 0
|
|
|
|
3. **Check Neo4j**:
|
|
```cypher
|
|
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
|
|
RETURN count(r) as relation_count
|
|
```
|
|
|
|
## Improvements Made
|
|
|
|
1. ✅ **Added PDF dependencies** - `unstructured[pdf]`, `unstructured[docx]`, etc.
|
|
2. ✅ **Added fallback extractors** - Uses `pdfplumber` if unstructured fails
|
|
3. ✅ **Better error handling** - Shows actual errors in job status
|
|
4. ✅ **Improved logging** - More detailed logs for debugging
|
|
5. ✅ **Better Neo4j query** - Validates data before writing
|
|
|
|
## Troubleshooting
|
|
|
|
If you still see 0 relations after rebuilding:
|
|
|
|
1. **Check extraction logs**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "extract"
|
|
```
|
|
|
|
2. **Check Claude analysis**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "claude\|analyze"
|
|
```
|
|
|
|
3. **Check Neo4j connection**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph"
|
|
```
|
|
|
|
4. **Verify document has causal language**:
|
|
- Not all documents contain causal relationships
|
|
- Try uploading a document with clear cause-effect statements
|
|
- Example: "Smoking causes lung cancer" or "Rain causes flooding"
|
|
|
|
## Next Steps
|
|
|
|
1. Rebuild the service with new dependencies
|
|
2. Re-upload documents
|
|
3. Check Neo4j for relationships
|
|
4. If still no results, check service logs for errors
|
|
5. Verify the document contains causal language
|
|
|