- Add S3 folder tagging endpoint with AWS S3 integration - Implement robust JSON parsing with enhanced extraction logic - Strengthen Claude AI prompt to prevent explanatory text - Add error categorization and improved error handling - Add comprehensive documentation and testing guides
7.5 KiB
7.5 KiB
Testing Guide - S3 Folder Endpoint
✅ Implementation Verification
All components have been implemented and verified:
- ✅ AWS SDK installed
- ✅ S3Service created with edge case handling
- ✅ TagS3FolderUseCase created with deduplication logic
- ✅ Controller method added
- ✅ Route added
- ✅ Dependency container updated
- ✅ Server.js updated
- ✅ Documentation updated
- ✅ No syntax errors
- ✅ No linter errors
🧪 Testing Checklist
1. Unit Testing (Recommended)
S3Service Tests
- Path normalization (trailing slashes, whitespace)
- Image file filtering (only image files processed)
- Hidden file filtering (
.DS_Storeignored) - File size validation (50MB limit)
- File type validation (magic number validation)
- S3 pagination (>1000 objects)
- Error handling (AWS errors, network errors)
TagS3FolderUseCase Tests
- Database duplicate detection
- In-folder duplicate detection
- Tag deduplication (category + value)
- Case sensitivity handling
- Whitespace normalization
- Confidence handling (keep highest)
- Partial failure handling
- Empty folder handling
Controller Tests
- Request validation (Joi schema)
- Missing AWS credentials handling
- Response formatting
- Error handling
2. Integration Testing
S3 Folder Endpoint Tests
-
Basic functionality: Tag images from S3 folder
curl -X POST http://localhost:3000/api/images/tag-s3-folder \ -H "Content-Type: application/json" \ -H "X-API-Key: your_key" \ -d '{ "parentFolder": "00Da3000003ZFiQ/", "subFolder": "a0La30000008vSXEAY/" }' -
Default parent folder: Omit parentFolder (should use default)
curl -X POST http://localhost:3000/api/images/tag-s3-folder \ -H "Content-Type: application/json" \ -H "X-API-Key: your_key" \ -d '{ "subFolder": "a0La30000008vSXEAY/" }' -
Empty folder: Test with empty folder (should return error)
-
Non-existent folder: Test with non-existent folder (should return 404)
-
Database duplicates: Upload same image twice (should use cached tags)
-
In-folder duplicates: Folder with duplicate images (should process once)
-
Large folder: Test with folder containing 100+ images
-
Mixed results: Folder with some valid and some invalid images
3. Regression Testing
Existing Endpoints (Verify they still work)
- Tag single image:
POST /api/images/tag - Tag base64 image:
POST /api/images/tag-base64 - Tag batch images:
POST /api/images/tag/batch - Tag batch base64 images:
POST /api/images/tag-base64/batch - Search by tag:
GET /api/images/search?tag=kitchen - Get statistics:
GET /api/images/stats - Health check:
GET /api/images/health
4. Edge Case Testing
Path Normalization
- Parent folder without trailing slash:
"00Da3000003ZFiQ"(should add/) - Parent folder with trailing slash:
"00Da3000003ZFiQ/"(should work) - Extra trailing slashes:
"00Da3000003ZFiQ//"(should normalize) - Whitespace in folder names:
" 00Da3000003ZFiQ/ "(should trim) - Leading slash in subfolder:
"/a0La30000008vSXEAY/"(should remove)
Tag Deduplication
- Case sensitivity:
"Kitchen"vs"kitchen"(should deduplicate) - Whitespace:
"fully furnished"vs"fully furnished"(should deduplicate) - Different categories, same value:
{category: "Room Type", value: "kitchen"}vs{category: "Style", value: "kitchen"}(should NOT deduplicate) - Same category + value, different confidence: Should keep highest confidence
- Missing confidence: Should use default (0.5)
- Invalid confidence: Should clamp to 0-1 range
Error Scenarios
- Missing AWS credentials: Should return clear error
- Invalid AWS credentials: Should return clear error
- S3 bucket not found: Should return 404
- S3 access denied: Should return clear error
- Network timeout: Should handle gracefully
- All images fail: Should return clear error message
5. Performance Testing
- Small folder (1-10 images): Should process quickly
- Medium folder (10-50 images): Should process within reasonable time
- Large folder (50-100 images): Should process with concurrency limits
- Very large folder (100+ images): Should handle pagination
- Memory usage: Monitor memory during large folder processing
🐛 Troubleshooting
Issue: "S3 folder endpoint is not available"
Solution: Check AWS credentials in .env file:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_S3_BUCKETAWS_REGION(optional, defaults to us-east-1)
Issue: "Access denied to S3 bucket"
Solution: Verify IAM user has permissions:
s3:ListBucketfor the buckets3:GetObjectfor objects in the folder
Issue: "Folder not found or empty"
Solution:
- Verify folder path is correct
- Check folder exists in S3 bucket
- Verify folder contains image files (not just folders)
Issue: "All images failed to process"
Solution:
- Check logs for specific error messages
- Verify images are valid image files
- Check file sizes (must be <50MB)
- Verify Claude API key is set
📊 Expected Results
Successful Response
{
"success": true,
"message": "S3 folder processed successfully: 31 images tagged",
"data": {
"parentFolder": "00Da3000003ZFiQ/",
"subFolder": "a0La30000008vSXEAY/",
"totalImages": 31,
"processedImages": 31,
"databaseDuplicates": 5,
"inFolderDuplicates": 2,
"newImages": 24,
"failedImages": 0,
"mergedTags": [...],
"uniqueTags": 127,
"totalTagsBeforeDedup": 450,
"summaries": [...],
"errors": null
}
}
Error Response (Missing Credentials)
{
"success": false,
"message": "S3 folder endpoint is not available. AWS credentials not configured.",
"timestamp": "2025-11-03T10:30:00.000Z"
}
Error Response (Empty Folder)
{
"success": false,
"message": "No images found in folder: 00Da3000003ZFiQ/a0La30000008vSXEAY/",
"timestamp": "2025-11-03T10:30:00.000Z"
}
✅ Verification Steps
- Start server:
npm run dev - Check health:
curl http://localhost:3000/api/images/health - Test S3 endpoint: Use Postman or curl with valid AWS credentials
- Verify duplicates: Upload same images twice, check for cached results
- Verify deduplication: Check merged tags have no duplicates (category + value)
- Check logs: Review logs for any errors or warnings
- Test existing endpoints: Verify all existing endpoints still work
🎯 Success Criteria
- ✅ S3 folder endpoint responds correctly
- ✅ Database duplicates use cached tags
- ✅ In-folder duplicates are handled correctly
- ✅ Tags are deduplicated (category + value, case-insensitive)
- ✅ All edge cases handled gracefully
- ✅ Existing endpoints still work (regression)
- ✅ Error messages are clear and helpful
- ✅ Logs contain useful information
📝 Notes
- Concurrency: Images are processed 5 at a time to avoid overwhelming the system
- Memory: Large folders are processed in batches to avoid memory issues
- Duplicates: Both database and in-folder duplicates are detected and handled
- Deduplication: Tags are deduplicated based on category + value (case-insensitive, whitespace-normalized)
- Error Handling: Partial failures are handled gracefully (some images succeed, some fail)