CP_AUTOMATION/tests/load_tests/COMPLETE_FAILURE_ANALYSIS.md
2025-12-15 17:15:08 +05:30

146 lines
3.7 KiB
Markdown

# Complete Load Test Failure Analysis
## 📊 Summary of Both Test Runs
### Test 1: 100 Students
- **Result**: 0% success (100% failed)
- **Error**: `InvalidSessionIdException: invalid session id: session deleted as the browser has closed the connection`
- **Root Cause**: **System Resource Exhaustion** - Too many concurrent browsers (100)
### Test 2: 1 Student
- **Result**: 0% success (100% failed)
- **Error**: `Password reset API call did not complete within timeout`
- **Root Cause**: **Backend API Performance** - Password reset API taking >16 seconds
---
## 🔴 Issue #1: 100 Students - System Resource Exhaustion
### Problem
Running 100 concurrent Chrome browsers exceeds system capacity.
### Solution ✅ IMPLEMENTED
**Reduce concurrency to 20-30 browsers:**
```bash
--workers 20 # Instead of 100
```
### Status
**RESOLVED** - Use `--workers 20` for load testing
---
## 🔴 Issue #2: 1 Student - Backend API Timeout
### Problem
Password reset API is taking longer than 16 seconds to respond.
### Solution ✅ IMPLEMENTED
**Increased timeout from 16 seconds to 60 seconds:**
- Modified `pages/mandatory_reset_page.py`
- Changed: `max_wait = max(LONG_WAIT, 60)` (60 seconds minimum)
- Improved error messages with more context
### Status
**FIXED** - Timeout increased to 60 seconds
---
## 🎯 What Each Issue Means
### Issue #1 (100 Students)
- **Automation**: ✅ Working correctly
- **Backend**: ✅ Working correctly
- **System**: ❌ Cannot handle 100 browsers
- **Fix**: Reduce to 20-30 concurrent browsers
### Issue #2 (1 Student)
- **Automation**: ✅ Working correctly
- **Backend**: ⚠️ **Slow API response** (>16 seconds)
- **System**: ✅ Can handle 1 browser
- **Fix**: ✅ Timeout increased to 60 seconds
---
## ✅ Recommended Test Strategy
### Step 1: Test with 1 Student (Verify Fix)
```bash
python3 tests/load_tests/test_generic_load_assessments.py \
--csv students_with_passwords_2025-12-15T10-49-08_01.csv \
--start 0 --end 1 \
--workers 1 \
--headless \
--metrics-interval 1
```
**Expected**: Should now work with 60-second timeout
### Step 2: Test with 10 Students
```bash
--start 0 --end 10 --workers 10
```
### Step 3: Test with 20 Students
```bash
--start 0 --end 20 --workers 20
```
### Step 4: Scale Up Gradually
- 20 → 30 → 50 → 100 (if system can handle it)
- Or use multi-device for 100+ students
---
## 🔍 Backend Performance Investigation
### If Timeout Still Occurs (Even with 60s)
**Check backend:**
1. **Backend Logs**: Look for password reset API calls
2. **Database Performance**: Check query times
3. **API Response Times**: Monitor endpoint performance
4. **Network**: Check for latency issues
**Possible Backend Issues:**
- Slow database queries
- Heavy server load
- Network latency
- Backend service issues
---
## 📋 Changes Made
### 1. Increased Password Reset Timeout
- **File**: `pages/mandatory_reset_page.py`
- **Change**: Timeout increased from 16s to 60s
- **Line**: ~392
### 2. Improved Error Messages
- **File**: `pages/mandatory_reset_page.py`
- **Change**: Better error context (modal status, errors, elapsed time)
- **Line**: ~447
### 3. Enhanced Toast Detection
- **File**: `pages/mandatory_reset_page.py`
- **Change**: Added data-testid detection + improved XPath fallbacks
- **Line**: ~406
---
## 🎯 Next Steps
1. **Test with 1 student** - Verify timeout fix works
2. **If successful** - Scale up to 10, then 20 students
3. **If timeout still occurs** - Investigate backend performance
4. **For 100 students** - Use `--workers 20` or multi-device
---
**Summary**:
- ✅ Issue #1 fixed: Use `--workers 20` instead of 100
- ✅ Issue #2 fixed: Timeout increased to 60 seconds
- ⚠️ If Issue #2 persists: Backend performance needs investigation