153 lines
3.6 KiB
Markdown
153 lines
3.6 KiB
Markdown
# Rebuild Instructions - Multi-Document Upload Service
|
|
|
|
## Issue: Empty Graph in Neo4j
|
|
|
|
**Problem**: Query returns "(no changes, no records)" because the job completed with 0 relations.
|
|
|
|
**Root Cause**: PDF extraction failed due to missing dependencies (`unstructured[pdf]`).
|
|
|
|
## Fixes Applied
|
|
|
|
1. ✅ Added PDF dependencies (`unstructured[pdf]`, `unstructured[docx]`, etc.)
|
|
2. ✅ Added fallback extractors (pdfplumber, python-docx, python-pptx)
|
|
3. ✅ Improved error handling and logging
|
|
4. ✅ Fixed Neo4j query syntax
|
|
5. ✅ Better status messages
|
|
|
|
## Rebuild Steps
|
|
|
|
### Step 1: Rebuild the Service
|
|
|
|
```bash
|
|
cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine
|
|
|
|
# Stop the service
|
|
docker-compose stop multi-document-upload-service
|
|
|
|
# Rebuild with new dependencies
|
|
docker-compose build --no-cache multi-document-upload-service
|
|
|
|
# Start the service
|
|
docker-compose up -d multi-document-upload-service
|
|
|
|
# Check logs to verify it's starting correctly
|
|
docker-compose logs -f multi-document-upload-service
|
|
```
|
|
|
|
### Step 2: Verify Dependencies
|
|
|
|
```bash
|
|
# Check if unstructured[pdf] is installed
|
|
docker-compose exec multi-document-upload-service pip list | grep unstructured
|
|
|
|
# You should see:
|
|
# unstructured
|
|
# unstructured-pdf
|
|
# unstructured-docx
|
|
# etc.
|
|
```
|
|
|
|
### Step 3: Test the Service
|
|
|
|
```bash
|
|
# Check health endpoint
|
|
curl http://localhost:8024/health
|
|
|
|
# Should return:
|
|
# {
|
|
# "status": "ok",
|
|
# "claude_model": "claude-3-5-haiku-latest",
|
|
# ...
|
|
# }
|
|
```
|
|
|
|
### Step 4: Re-upload Documents
|
|
|
|
1. Open frontend: `http://localhost:3001/project-builder`
|
|
2. Go to Step 1: Project Type
|
|
3. Find "Upload Documents for Knowledge Graph" section
|
|
4. Upload a PDF or other document
|
|
5. Wait for processing to complete
|
|
6. Check status - should show relation count > 0
|
|
|
|
### Step 5: Verify in Neo4j
|
|
|
|
Run these queries in Neo4j Browser (`http://localhost:7474`):
|
|
|
|
```cypher
|
|
// Check if any nodes exist
|
|
MATCH (n)
|
|
RETURN count(n) as node_count
|
|
|
|
// Check for CAUSES relationships
|
|
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
|
|
RETURN n.name as cause,
|
|
m.name as effect,
|
|
r.confidence as confidence,
|
|
r.job_id as job_id
|
|
LIMIT 50
|
|
```
|
|
|
|
## Expected Results
|
|
|
|
After rebuilding and re-uploading:
|
|
|
|
1. **PDF extraction succeeds** ✅
|
|
2. **Text is extracted** ✅
|
|
3. **Relations are extracted** ✅
|
|
4. **Relations are written to Neo4j** ✅
|
|
5. **Query returns results** ✅
|
|
|
|
## Troubleshooting
|
|
|
|
If you still see 0 relations:
|
|
|
|
1. **Check service logs**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | tail -50
|
|
```
|
|
|
|
2. **Check extraction logs**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "extract\|pdf"
|
|
```
|
|
|
|
3. **Check Claude analysis**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "claude\|analyze\|relation"
|
|
```
|
|
|
|
4. **Check Neo4j connection**:
|
|
```bash
|
|
docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph\|write"
|
|
```
|
|
|
|
5. **Verify document has causal language**:
|
|
- Not all documents contain causal relationships
|
|
- Try uploading a document with clear cause-effect statements
|
|
- Example: "Smoking causes lung cancer"
|
|
|
|
## Quick Test
|
|
|
|
Test with a simple text file:
|
|
|
|
1. Create a test file `test_causal.txt`:
|
|
```
|
|
Smoking cigarettes causes lung cancer.
|
|
Heavy rain causes flooding.
|
|
Exercise improves health.
|
|
```
|
|
|
|
2. Upload it via the frontend
|
|
3. Check Neo4j for relationships
|
|
4. Should see 3 causal relationships
|
|
|
|
## Next Steps
|
|
|
|
1. Rebuild the service
|
|
2. Re-upload documents
|
|
3. Check Neo4j for relationships
|
|
4. If still no results, check service logs
|
|
5. Verify the document contains causal language
|
|
|