codenuk_backend_mine/services/multi-document-upload-service/REBUILD_INSTRUCTIONS.md
2025-11-17 09:04:49 +05:30

153 lines
3.6 KiB
Markdown

# Rebuild Instructions - Multi-Document Upload Service
## Issue: Empty Graph in Neo4j
**Problem**: Query returns "(no changes, no records)" because the job completed with 0 relations.
**Root Cause**: PDF extraction failed due to missing dependencies (`unstructured[pdf]`).
## Fixes Applied
1. ✅ Added PDF dependencies (`unstructured[pdf]`, `unstructured[docx]`, etc.)
2. ✅ Added fallback extractors (pdfplumber, python-docx, python-pptx)
3. ✅ Improved error handling and logging
4. ✅ Fixed Neo4j query syntax
5. ✅ Better status messages
## Rebuild Steps
### Step 1: Rebuild the Service
```bash
cd /home/tech4biz/Desktop/prakash/codenuk/backend_new1/codenuk_backend_mine
# Stop the service
docker-compose stop multi-document-upload-service
# Rebuild with new dependencies
docker-compose build --no-cache multi-document-upload-service
# Start the service
docker-compose up -d multi-document-upload-service
# Check logs to verify it's starting correctly
docker-compose logs -f multi-document-upload-service
```
### Step 2: Verify Dependencies
```bash
# Check if unstructured[pdf] is installed
docker-compose exec multi-document-upload-service pip list | grep unstructured
# You should see:
# unstructured
# unstructured-pdf
# unstructured-docx
# etc.
```
### Step 3: Test the Service
```bash
# Check health endpoint
curl http://localhost:8024/health
# Should return:
# {
# "status": "ok",
# "claude_model": "claude-3-5-haiku-latest",
# ...
# }
```
### Step 4: Re-upload Documents
1. Open frontend: `http://localhost:3001/project-builder`
2. Go to Step 1: Project Type
3. Find "Upload Documents for Knowledge Graph" section
4. Upload a PDF or other document
5. Wait for processing to complete
6. Check status - should show relation count > 0
### Step 5: Verify in Neo4j
Run these queries in Neo4j Browser (`http://localhost:7474`):
```cypher
// Check if any nodes exist
MATCH (n)
RETURN count(n) as node_count
// Check for CAUSES relationships
MATCH (n:Concept)-[r:CAUSES]->(m:Concept)
RETURN n.name as cause,
m.name as effect,
r.confidence as confidence,
r.job_id as job_id
LIMIT 50
```
## Expected Results
After rebuilding and re-uploading:
1. **PDF extraction succeeds**
2. **Text is extracted**
3. **Relations are extracted**
4. **Relations are written to Neo4j**
5. **Query returns results**
## Troubleshooting
If you still see 0 relations:
1. **Check service logs**:
```bash
docker-compose logs multi-document-upload-service | tail -50
```
2. **Check extraction logs**:
```bash
docker-compose logs multi-document-upload-service | grep -i "extract\|pdf"
```
3. **Check Claude analysis**:
```bash
docker-compose logs multi-document-upload-service | grep -i "claude\|analyze\|relation"
```
4. **Check Neo4j connection**:
```bash
docker-compose logs multi-document-upload-service | grep -i "neo4j\|graph\|write"
```
5. **Verify document has causal language**:
- Not all documents contain causal relationships
- Try uploading a document with clear cause-effect statements
- Example: "Smoking causes lung cancer"
## Quick Test
Test with a simple text file:
1. Create a test file `test_causal.txt`:
```
Smoking cigarettes causes lung cancer.
Heavy rain causes flooding.
Exercise improves health.
```
2. Upload it via the frontend
3. Check Neo4j for relationships
4. Should see 3 causal relationships
## Next Steps
1. Rebuild the service
2. Re-upload documents
3. Check Neo4j for relationships
4. If still no results, check service logs
5. Verify the document contains causal language