# Testing Guide - S3 Folder Endpoint ## ✅ Implementation Verification All components have been implemented and verified: - ✅ AWS SDK installed - ✅ S3Service created with edge case handling - ✅ TagS3FolderUseCase created with deduplication logic - ✅ Controller method added - ✅ Route added - ✅ Dependency container updated - ✅ Server.js updated - ✅ Documentation updated - ✅ No syntax errors - ✅ No linter errors --- ## 🧪 Testing Checklist ### 1. Unit Testing (Recommended) #### S3Service Tests - [ ] Path normalization (trailing slashes, whitespace) - [ ] Image file filtering (only image files processed) - [ ] Hidden file filtering (`.DS_Store` ignored) - [ ] File size validation (50MB limit) - [ ] File type validation (magic number validation) - [ ] S3 pagination (>1000 objects) - [ ] Error handling (AWS errors, network errors) #### TagS3FolderUseCase Tests - [ ] Database duplicate detection - [ ] In-folder duplicate detection - [ ] Tag deduplication (category + value) - [ ] Case sensitivity handling - [ ] Whitespace normalization - [ ] Confidence handling (keep highest) - [ ] Partial failure handling - [ ] Empty folder handling #### Controller Tests - [ ] Request validation (Joi schema) - [ ] Missing AWS credentials handling - [ ] Response formatting - [ ] Error handling --- ### 2. Integration Testing #### S3 Folder Endpoint Tests - [ ] **Basic functionality**: Tag images from S3 folder ```bash curl -X POST http://localhost:3000/api/images/tag-s3-folder \ -H "Content-Type: application/json" \ -H "X-API-Key: your_key" \ -d '{ "parentFolder": "00Da3000003ZFiQ/", "subFolder": "a0La30000008vSXEAY/" }' ``` - [ ] **Default parent folder**: Omit parentFolder (should use default) ```bash curl -X POST http://localhost:3000/api/images/tag-s3-folder \ -H "Content-Type: application/json" \ -H "X-API-Key: your_key" \ -d '{ "subFolder": "a0La30000008vSXEAY/" }' ``` - [ ] **Empty folder**: Test with empty folder (should return error) - [ ] **Non-existent folder**: Test with non-existent folder (should return 404) - [ ] **Database duplicates**: Upload same image twice (should use cached tags) - [ ] **In-folder duplicates**: Folder with duplicate images (should process once) - [ ] **Large folder**: Test with folder containing 100+ images - [ ] **Mixed results**: Folder with some valid and some invalid images --- ### 3. Regression Testing #### Existing Endpoints (Verify they still work) - [ ] **Tag single image**: `POST /api/images/tag` - [ ] **Tag base64 image**: `POST /api/images/tag-base64` - [ ] **Tag batch images**: `POST /api/images/tag/batch` - [ ] **Tag batch base64 images**: `POST /api/images/tag-base64/batch` - [ ] **Search by tag**: `GET /api/images/search?tag=kitchen` - [ ] **Get statistics**: `GET /api/images/stats` - [ ] **Health check**: `GET /api/images/health` --- ### 4. Edge Case Testing #### Path Normalization - [ ] Parent folder without trailing slash: `"00Da3000003ZFiQ"` (should add `/`) - [ ] Parent folder with trailing slash: `"00Da3000003ZFiQ/"` (should work) - [ ] Extra trailing slashes: `"00Da3000003ZFiQ//"` (should normalize) - [ ] Whitespace in folder names: `" 00Da3000003ZFiQ/ "` (should trim) - [ ] Leading slash in subfolder: `"/a0La30000008vSXEAY/"` (should remove) #### Tag Deduplication - [ ] Case sensitivity: `"Kitchen"` vs `"kitchen"` (should deduplicate) - [ ] Whitespace: `"fully furnished"` vs `"fully furnished"` (should deduplicate) - [ ] Different categories, same value: `{category: "Room Type", value: "kitchen"}` vs `{category: "Style", value: "kitchen"}` (should NOT deduplicate) - [ ] Same category + value, different confidence: Should keep highest confidence - [ ] Missing confidence: Should use default (0.5) - [ ] Invalid confidence: Should clamp to 0-1 range #### Error Scenarios - [ ] Missing AWS credentials: Should return clear error - [ ] Invalid AWS credentials: Should return clear error - [ ] S3 bucket not found: Should return 404 - [ ] S3 access denied: Should return clear error - [ ] Network timeout: Should handle gracefully - [ ] All images fail: Should return clear error message --- ### 5. Performance Testing - [ ] Small folder (1-10 images): Should process quickly - [ ] Medium folder (10-50 images): Should process within reasonable time - [ ] Large folder (50-100 images): Should process with concurrency limits - [ ] Very large folder (100+ images): Should handle pagination - [ ] Memory usage: Monitor memory during large folder processing --- ## 🐛 Troubleshooting ### Issue: "S3 folder endpoint is not available" **Solution**: Check AWS credentials in `.env` file: - `AWS_ACCESS_KEY_ID` - `AWS_SECRET_ACCESS_KEY` - `AWS_S3_BUCKET` - `AWS_REGION` (optional, defaults to us-east-1) ### Issue: "Access denied to S3 bucket" **Solution**: Verify IAM user has permissions: - `s3:ListBucket` for the bucket - `s3:GetObject` for objects in the folder ### Issue: "Folder not found or empty" **Solution**: - Verify folder path is correct - Check folder exists in S3 bucket - Verify folder contains image files (not just folders) ### Issue: "All images failed to process" **Solution**: - Check logs for specific error messages - Verify images are valid image files - Check file sizes (must be <50MB) - Verify Claude API key is set --- ## 📊 Expected Results ### Successful Response ```json { "success": true, "message": "S3 folder processed successfully: 31 images tagged", "data": { "parentFolder": "00Da3000003ZFiQ/", "subFolder": "a0La30000008vSXEAY/", "totalImages": 31, "processedImages": 31, "databaseDuplicates": 5, "inFolderDuplicates": 2, "newImages": 24, "failedImages": 0, "mergedTags": [...], "uniqueTags": 127, "totalTagsBeforeDedup": 450, "summaries": [...], "errors": null } } ``` ### Error Response (Missing Credentials) ```json { "success": false, "message": "S3 folder endpoint is not available. AWS credentials not configured.", "timestamp": "2025-11-03T10:30:00.000Z" } ``` ### Error Response (Empty Folder) ```json { "success": false, "message": "No images found in folder: 00Da3000003ZFiQ/a0La30000008vSXEAY/", "timestamp": "2025-11-03T10:30:00.000Z" } ``` --- ## ✅ Verification Steps 1. **Start server**: `npm run dev` 2. **Check health**: `curl http://localhost:3000/api/images/health` 3. **Test S3 endpoint**: Use Postman or curl with valid AWS credentials 4. **Verify duplicates**: Upload same images twice, check for cached results 5. **Verify deduplication**: Check merged tags have no duplicates (category + value) 6. **Check logs**: Review logs for any errors or warnings 7. **Test existing endpoints**: Verify all existing endpoints still work --- ## 🎯 Success Criteria - ✅ S3 folder endpoint responds correctly - ✅ Database duplicates use cached tags - ✅ In-folder duplicates are handled correctly - ✅ Tags are deduplicated (category + value, case-insensitive) - ✅ All edge cases handled gracefully - ✅ Existing endpoints still work (regression) - ✅ Error messages are clear and helpful - ✅ Logs contain useful information --- ## 📝 Notes - **Concurrency**: Images are processed 5 at a time to avoid overwhelming the system - **Memory**: Large folders are processed in batches to avoid memory issues - **Duplicates**: Both database and in-folder duplicates are detected and handled - **Deduplication**: Tags are deduplicated based on category + value (case-insensitive, whitespace-normalized) - **Error Handling**: Partial failures are handled gracefully (some images succeed, some fail)