CP_AUTOMATION/tests/load_tests/COMPLETE_FAILURE_ANALYSIS.md
2025-12-15 17:15:08 +05:30

3.7 KiB

Complete Load Test Failure Analysis

📊 Summary of Both Test Runs

Test 1: 100 Students

  • Result: 0% success (100% failed)
  • Error: InvalidSessionIdException: invalid session id: session deleted as the browser has closed the connection
  • Root Cause: System Resource Exhaustion - Too many concurrent browsers (100)

Test 2: 1 Student

  • Result: 0% success (100% failed)
  • Error: Password reset API call did not complete within timeout
  • Root Cause: Backend API Performance - Password reset API taking >16 seconds

🔴 Issue #1: 100 Students - System Resource Exhaustion

Problem

Running 100 concurrent Chrome browsers exceeds system capacity.

Solution IMPLEMENTED

Reduce concurrency to 20-30 browsers:

--workers 20  # Instead of 100

Status

RESOLVED - Use --workers 20 for load testing


🔴 Issue #2: 1 Student - Backend API Timeout

Problem

Password reset API is taking longer than 16 seconds to respond.

Solution IMPLEMENTED

Increased timeout from 16 seconds to 60 seconds:

  • Modified pages/mandatory_reset_page.py
  • Changed: max_wait = max(LONG_WAIT, 60) (60 seconds minimum)
  • Improved error messages with more context

Status

FIXED - Timeout increased to 60 seconds


🎯 What Each Issue Means

Issue #1 (100 Students)

  • Automation: Working correctly
  • Backend: Working correctly
  • System: Cannot handle 100 browsers
  • Fix: Reduce to 20-30 concurrent browsers

Issue #2 (1 Student)

  • Automation: Working correctly
  • Backend: ⚠️ Slow API response (>16 seconds)
  • System: Can handle 1 browser
  • Fix: Timeout increased to 60 seconds

Step 1: Test with 1 Student (Verify Fix)

python3 tests/load_tests/test_generic_load_assessments.py \
    --csv students_with_passwords_2025-12-15T10-49-08_01.csv \
    --start 0 --end 1 \
    --workers 1 \
    --headless \
    --metrics-interval 1

Expected: Should now work with 60-second timeout

Step 2: Test with 10 Students

--start 0 --end 10 --workers 10

Step 3: Test with 20 Students

--start 0 --end 20 --workers 20

Step 4: Scale Up Gradually

  • 20 → 30 → 50 → 100 (if system can handle it)
  • Or use multi-device for 100+ students

🔍 Backend Performance Investigation

If Timeout Still Occurs (Even with 60s)

Check backend:

  1. Backend Logs: Look for password reset API calls
  2. Database Performance: Check query times
  3. API Response Times: Monitor endpoint performance
  4. Network: Check for latency issues

Possible Backend Issues:

  • Slow database queries
  • Heavy server load
  • Network latency
  • Backend service issues

📋 Changes Made

1. Increased Password Reset Timeout

  • File: pages/mandatory_reset_page.py
  • Change: Timeout increased from 16s to 60s
  • Line: ~392

2. Improved Error Messages

  • File: pages/mandatory_reset_page.py
  • Change: Better error context (modal status, errors, elapsed time)
  • Line: ~447

3. Enhanced Toast Detection

  • File: pages/mandatory_reset_page.py
  • Change: Added data-testid detection + improved XPath fallbacks
  • Line: ~406

🎯 Next Steps

  1. Test with 1 student - Verify timeout fix works
  2. If successful - Scale up to 10, then 20 students
  3. If timeout still occurs - Investigate backend performance
  4. For 100 students - Use --workers 20 or multi-device

Summary:

  • Issue #1 fixed: Use --workers 20 instead of 100
  • Issue #2 fixed: Timeout increased to 60 seconds
  • ⚠️ If Issue #2 persists: Backend performance needs investigation