# Rebuild Instructions - Multi-Document Upload Service ## Issue: Empty Graph in Neo4j **Problem**: Query returns "(no changes, no records)" because the job completed with 0 relations. **Root Cause**: PDF extraction failed due to missing dependencies (`unstructured[pdf]`). ## Fixes Applied 1. ✅ Added PDF dependencies (`unstructured[pdf]`, `unstructured[docx]`, etc.) 2. ✅ Added fallback extractors (pdfplumber, python-docx, python-pptx) 3. ✅ Improved error handling and logging 4. ✅ Fixed Neo4j query syntax 5. ✅ Better status messages ## Rebuild Steps ### Step 1: Rebuild the Service ```bash cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine # Stop the service docker-compose stop multi-document-upload-service # Rebuild with new dependencies docker-compose build --no-cache multi-document-upload-service # Start the service docker-compose up -d multi-document-upload-service # Check logs to verify it's starting correctly docker-compose logs -f multi-document-upload-service ``` ### Step 2: Verify Dependencies ```bash # Check if unstructured[pdf] is installed docker-compose exec multi-document-upload-service pip list | grep unstructured # You should see: # unstructured # unstructured-pdf # unstructured-docx # etc. ``` ### Step 3: Test the Service ```bash # Check health endpoint curl http://localhost:8024/health # Should return: # { # "status": "ok", # "claude_model": "claude-3-5-haiku-latest", # ... # } ``` ### Step 4: Re-upload Documents 1. Open frontend: `http://localhost:3001/project-builder` 2. Go to Step 1: Project Type 3. Find "Upload Documents for Knowledge Graph" section 4. Upload a PDF or other document 5. Wait for processing to complete 6. Check status - should show relation count > 0 ### Step 5: Verify in Neo4j Run these queries in Neo4j Browser (`http://localhost:7474`): ```cypher // Check if any nodes exist MATCH (n) RETURN count(n) as node_count // Check for CAUSES relationships MATCH (n:Concept)-[r:CAUSES]->(m:Concept) RETURN n.name as cause, m.name as effect, r.confidence as confidence, r.job_id as job_id LIMIT 50 ``` ## Expected Results After rebuilding and re-uploading: 1. **PDF extraction succeeds** ✅ 2. **Text is extracted** ✅ 3. **Relations are extracted** ✅ 4. **Relations are written to Neo4j** ✅ 5. **Query returns results** ✅ ## Troubleshooting If you still see 0 relations: 1. **Check service logs**: ```bash docker-compose logs multi-document-upload-service | tail -50 ``` 2. **Check extraction logs**: ```bash docker-compose logs multi-document-upload-service | grep -i "extract\|pdf" ``` 3. **Check Claude analysis**: ```bash docker-compose logs multi-document-upload-service | grep -i "claude\|analyze\|relation" ``` 4. **Check Neo4j connection**: ```bash docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph\|write" ``` 5. **Verify document has causal language**: - Not all documents contain causal relationships - Try uploading a document with clear cause-effect statements - Example: "Smoking causes lung cancer" ## Quick Test Test with a simple text file: 1. Create a test file `test_causal.txt`: ``` Smoking cigarettes causes lung cancer. Heavy rain causes flooding. Exercise improves health. ``` 2. Upload it via the frontend 3. Check Neo4j for relationships 4. Should see 3 causal relationships ## Next Steps 1. Rebuild the service 2. Re-upload documents 3. Check Neo4j for relationships 4. If still no results, check service logs 5. Verify the document contains causal language