image_tagger/TESTING_GUIDE.md
laxman 7403bc9044 Add S3 folder endpoint and critical JSON parsing fixes
- Add S3 folder tagging endpoint with AWS S3 integration
- Implement robust JSON parsing with enhanced extraction logic
- Strengthen Claude AI prompt to prevent explanatory text
- Add error categorization and improved error handling
- Add comprehensive documentation and testing guides
2025-11-06 17:21:55 +05:30

7.5 KiB

Testing Guide - S3 Folder Endpoint

Implementation Verification

All components have been implemented and verified:

  • AWS SDK installed
  • S3Service created with edge case handling
  • TagS3FolderUseCase created with deduplication logic
  • Controller method added
  • Route added
  • Dependency container updated
  • Server.js updated
  • Documentation updated
  • No syntax errors
  • No linter errors

🧪 Testing Checklist

S3Service Tests

  • Path normalization (trailing slashes, whitespace)
  • Image file filtering (only image files processed)
  • Hidden file filtering (.DS_Store ignored)
  • File size validation (50MB limit)
  • File type validation (magic number validation)
  • S3 pagination (>1000 objects)
  • Error handling (AWS errors, network errors)

TagS3FolderUseCase Tests

  • Database duplicate detection
  • In-folder duplicate detection
  • Tag deduplication (category + value)
  • Case sensitivity handling
  • Whitespace normalization
  • Confidence handling (keep highest)
  • Partial failure handling
  • Empty folder handling

Controller Tests

  • Request validation (Joi schema)
  • Missing AWS credentials handling
  • Response formatting
  • Error handling

2. Integration Testing

S3 Folder Endpoint Tests

  • Basic functionality: Tag images from S3 folder

    curl -X POST http://localhost:3000/api/images/tag-s3-folder \
      -H "Content-Type: application/json" \
      -H "X-API-Key: your_key" \
      -d '{
        "parentFolder": "00Da3000003ZFiQ/",
        "subFolder": "a0La30000008vSXEAY/"
      }'
    
  • Default parent folder: Omit parentFolder (should use default)

    curl -X POST http://localhost:3000/api/images/tag-s3-folder \
      -H "Content-Type: application/json" \
      -H "X-API-Key: your_key" \
      -d '{
        "subFolder": "a0La30000008vSXEAY/"
      }'
    
  • Empty folder: Test with empty folder (should return error)

  • Non-existent folder: Test with non-existent folder (should return 404)

  • Database duplicates: Upload same image twice (should use cached tags)

  • In-folder duplicates: Folder with duplicate images (should process once)

  • Large folder: Test with folder containing 100+ images

  • Mixed results: Folder with some valid and some invalid images


3. Regression Testing

Existing Endpoints (Verify they still work)

  • Tag single image: POST /api/images/tag
  • Tag base64 image: POST /api/images/tag-base64
  • Tag batch images: POST /api/images/tag/batch
  • Tag batch base64 images: POST /api/images/tag-base64/batch
  • Search by tag: GET /api/images/search?tag=kitchen
  • Get statistics: GET /api/images/stats
  • Health check: GET /api/images/health

4. Edge Case Testing

Path Normalization

  • Parent folder without trailing slash: "00Da3000003ZFiQ" (should add /)
  • Parent folder with trailing slash: "00Da3000003ZFiQ/" (should work)
  • Extra trailing slashes: "00Da3000003ZFiQ//" (should normalize)
  • Whitespace in folder names: " 00Da3000003ZFiQ/ " (should trim)
  • Leading slash in subfolder: "/a0La30000008vSXEAY/" (should remove)

Tag Deduplication

  • Case sensitivity: "Kitchen" vs "kitchen" (should deduplicate)
  • Whitespace: "fully furnished" vs "fully furnished" (should deduplicate)
  • Different categories, same value: {category: "Room Type", value: "kitchen"} vs {category: "Style", value: "kitchen"} (should NOT deduplicate)
  • Same category + value, different confidence: Should keep highest confidence
  • Missing confidence: Should use default (0.5)
  • Invalid confidence: Should clamp to 0-1 range

Error Scenarios

  • Missing AWS credentials: Should return clear error
  • Invalid AWS credentials: Should return clear error
  • S3 bucket not found: Should return 404
  • S3 access denied: Should return clear error
  • Network timeout: Should handle gracefully
  • All images fail: Should return clear error message

5. Performance Testing

  • Small folder (1-10 images): Should process quickly
  • Medium folder (10-50 images): Should process within reasonable time
  • Large folder (50-100 images): Should process with concurrency limits
  • Very large folder (100+ images): Should handle pagination
  • Memory usage: Monitor memory during large folder processing

🐛 Troubleshooting

Issue: "S3 folder endpoint is not available"

Solution: Check AWS credentials in .env file:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_S3_BUCKET
  • AWS_REGION (optional, defaults to us-east-1)

Issue: "Access denied to S3 bucket"

Solution: Verify IAM user has permissions:

  • s3:ListBucket for the bucket
  • s3:GetObject for objects in the folder

Issue: "Folder not found or empty"

Solution:

  • Verify folder path is correct
  • Check folder exists in S3 bucket
  • Verify folder contains image files (not just folders)

Issue: "All images failed to process"

Solution:

  • Check logs for specific error messages
  • Verify images are valid image files
  • Check file sizes (must be <50MB)
  • Verify Claude API key is set

📊 Expected Results

Successful Response

{
  "success": true,
  "message": "S3 folder processed successfully: 31 images tagged",
  "data": {
    "parentFolder": "00Da3000003ZFiQ/",
    "subFolder": "a0La30000008vSXEAY/",
    "totalImages": 31,
    "processedImages": 31,
    "databaseDuplicates": 5,
    "inFolderDuplicates": 2,
    "newImages": 24,
    "failedImages": 0,
    "mergedTags": [...],
    "uniqueTags": 127,
    "totalTagsBeforeDedup": 450,
    "summaries": [...],
    "errors": null
  }
}

Error Response (Missing Credentials)

{
  "success": false,
  "message": "S3 folder endpoint is not available. AWS credentials not configured.",
  "timestamp": "2025-11-03T10:30:00.000Z"
}

Error Response (Empty Folder)

{
  "success": false,
  "message": "No images found in folder: 00Da3000003ZFiQ/a0La30000008vSXEAY/",
  "timestamp": "2025-11-03T10:30:00.000Z"
}

Verification Steps

  1. Start server: npm run dev
  2. Check health: curl http://localhost:3000/api/images/health
  3. Test S3 endpoint: Use Postman or curl with valid AWS credentials
  4. Verify duplicates: Upload same images twice, check for cached results
  5. Verify deduplication: Check merged tags have no duplicates (category + value)
  6. Check logs: Review logs for any errors or warnings
  7. Test existing endpoints: Verify all existing endpoints still work

🎯 Success Criteria

  • S3 folder endpoint responds correctly
  • Database duplicates use cached tags
  • In-folder duplicates are handled correctly
  • Tags are deduplicated (category + value, case-insensitive)
  • All edge cases handled gracefully
  • Existing endpoints still work (regression)
  • Error messages are clear and helpful
  • Logs contain useful information

📝 Notes

  • Concurrency: Images are processed 5 at a time to avoid overwhelming the system
  • Memory: Large folders are processed in batches to avoid memory issues
  • Duplicates: Both database and in-folder duplicates are detected and handled
  • Deduplication: Tags are deduplicated based on category + value (case-insensitive, whitespace-normalized)
  • Error Handling: Partial failures are handled gracefully (some images succeed, some fail)