CP_AUTOMATION/tests/load_tests/LOAD_TEST_ISSUE_ANALYSIS.md
2025-12-16 13:04:32 +05:30

5.5 KiB

Load Test Issue Analysis - 100 Students Failed

🔴 Issue 1: Chrome User Data Directory Conflict (FIXED)

Error

SessionNotCreatedException: session not created: probably user data directory is already in use, 
please specify a unique value for --user-data-dir argument

Root Cause

  • Problem: All 30 Chrome browsers were trying to use the same user data directory
  • Result: Chrome instances conflicted with each other and failed to start
  • Impact: 100% failure rate (all browsers crashed at startup)

Solution Applied

  • Fix: Each browser now gets a unique temporary user data directory
  • Implementation: tempfile.mkdtemp(prefix=f'chrome_user_data_{user_id}_')
  • Cleanup: Temp directories are automatically cleaned up after driver quits

Status

FIXED - This issue is now resolved in the code


🔴 Issue 2: Metrics Interval Confusion (CLARIFICATION)

Your Question

"how can we get metrics interval at 10,10, because it's continuous process, all the Students should work simultaneously"

Clarification

--metrics-interval 10 does NOT mean:

  • Students run in batches of 10
  • Only 10 students run at a time
  • Students wait for each other

--metrics-interval 10 ACTUALLY means:

  • Print metrics every 10 students complete
  • All students run simultaneously (30 at a time with 30 workers)
  • Metrics are just printed for visibility, not controlling execution

How It Works

Timeline:
0s:  Start 30 browsers (workers=30)
5s:  Student 1 completes → metrics NOT printed (1 < 10)
8s:  Student 2 completes → metrics NOT printed (2 < 10)
...
15s: Student 10 completes → ✅ METRICS PRINTED (10 % 10 == 0)
20s: Student 11 completes → metrics NOT printed
...
30s: Student 20 completes → ✅ METRICS PRINTED (20 % 10 == 0)

All 30 browsers are running simultaneously! The metrics interval just controls when you see the progress report.

Visual Explanation

With --workers 30:
┌─────────────────────────────────────────┐
│  Browser 1  │ Browser 2  │ Browser 3   │  ← All start at same time
│  Browser 4  │ Browser 5  │ Browser 6   │
│  ...        │ ...        │ ...        │
│  Browser 28 │ Browser 29 │ Browser 30 │
└─────────────────────────────────────────┘
     ↓           ↓           ↓
  Running    Running    Running
  (simultaneously, not in batches!)

📊 What Actually Happened

Execution Flow

  1. Script started successfully
  2. Loaded 100 students from CSV
  3. Started 30 concurrent browsers (workers=30)
  4. Chrome user data directory conflict → All browsers failed to start
  5. All 100 students failed within 60 seconds

Error Breakdown

  • User 1, 3, 4, etc.: SessionNotCreatedException (user data dir conflict)
  • User 2, etc.: InvalidSessionIdException (browser crashed after conflict)

Why So Fast?

  • All failures happened at browser startup (within seconds)
  • No students even reached the backend
  • This is a local Chrome configuration issue, NOT a backend issue

Solution Applied

Code Changes

  1. Added unique user data directory for each browser:

    user_data_dir = tempfile.mkdtemp(prefix=f'chrome_user_data_{user_id}_')
    options.add_argument(f'--user-data-dir={user_data_dir}')
    
  2. Added cleanup for temp directories after driver quits

Test Again

./scripts/PC1_100_students.sh

Expected Result: Browsers should start successfully now!


🎯 Is This a Backend Issue?

Answer: NO - This is a local Chrome configuration issue

Evidence:

  • No students reached the backend (all failed at browser startup)
  • Error is SessionNotCreatedException (Chrome issue, not API issue)
  • All failures happened within seconds (before any API calls)

Backend was never tested because browsers couldn't even start!


📈 Next Steps

  1. Test again with the fix:

    ./scripts/PC1_100_students.sh
    
  2. If it works, then proceed with 3 PCs (300 students)

  3. Monitor backend during the test:

    • Check backend logs
    • Monitor API response times
    • Check database performance
    • Monitor server resources (CPU, RAM)

💡 Understanding Metrics Interval

Current Behavior (Correct)

  • 30 workers = 30 browsers running simultaneously
  • Metrics interval 10 = Print progress every 10 completions
  • All students process in parallel, not sequentially

If You Want Different Behavior

Option 1: Print metrics more frequently

--metrics-interval 5  # Print every 5 completions

Option 2: Print metrics less frequently

--metrics-interval 20  # Print every 20 completions

Option 3: Print only at the end

--metrics-interval 1000  # Print only at the end (if < 1000 students)

Note: This does NOT change how students run - they still run simultaneously!


Summary

  1. Issue 1 (FIXED): Chrome user data directory conflict → Each browser now has unique directory
  2. Issue 2 (CLARIFIED): Metrics interval is just for printing, not batching
  3. Backend: Was never tested (browsers failed before reaching backend)
  4. Next: Test again with the fix to actually test backend capacity

The fix is ready - test again! 🚀