Re_Backend/DYNAMIC_TAT_THRESHOLDS.md

9.4 KiB

Dynamic TAT Thresholds Implementation

Problem Statement

Original Issue

The TAT system had hardcoded threshold percentages (50%, 75%, 100%) which created several problems:

  1. Job Naming Conflict: Jobs were named using threshold percentages (tat50-{reqId}-{levelId})
  2. Configuration Changes Didn't Apply: Changing threshold in settings didn't affect scheduled jobs
  3. Message Mismatch: Messages always said "50% elapsed" even if admin configured 55%
  4. Cancellation Issues: Uncertainty about whether jobs could be properly cancelled after config changes

Critical Edge Case Identified by User

Scenario:

1. Request created → TAT jobs scheduled:
   - tat50-REQ123-LEVEL456   (fires at 8 hours, says "50% elapsed")
   - tat75-REQ123-LEVEL456   (fires at 12 hours)
   - tatBreach-REQ123-LEVEL456 (fires at 16 hours)

2. Admin changes threshold from 50% → 55%

3. User approves at 9 hours (after old 50% fired)
   → Job already fired with "50% elapsed" message ❌
   → But admin configured 55% ❌
   → Inconsistent!

4. Even if approval happens before old 50%:
   → System cancels `tat50-REQ123-LEVEL456` ✅
   → But message would still say "50%" (hardcoded) ❌

Solution: Generic Job Names + Dynamic Thresholds

1. Generic Job Naming

Changed from percentage-based to generic names:

Before:

tat50-{requestId}-{levelId}
tat75-{requestId}-{levelId}
tatBreach-{requestId}-{levelId}

After:

tat-threshold1-{requestId}-{levelId}  // First threshold (configurable: 50%, 55%, 60%, etc.)
tat-threshold2-{requestId}-{levelId}  // Second threshold (configurable: 75%, 80%, etc.)
tat-breach-{requestId}-{levelId}      // Always 100% (deadline)

2. Store Threshold in Job Data

Instead of relying on job name, we store the actual percentage in job payload:

interface TatJobData {
  type: 'threshold1' | 'threshold2' | 'breach';
  threshold: number;  // Actual % (e.g., 55, 80, 100)
  requestId: string;
  levelId: string;
  approverId: string;
}

3. Dynamic Message Generation

Messages use the threshold from job data:

case 'threshold1':
  message = `⏳ ${threshold}% of TAT elapsed for Request ${requestNumber}`;
  // If threshold = 55, message says "55% of TAT elapsed" ✅

4. Configuration Cache Management

  • Configurations are cached for 5 minutes (performance)
  • Cache is automatically cleared when admin updates settings
  • Next scheduled job will use new thresholds

How It Solves the Edge Cases

Case 1: Config Changed After Job Creation

Scenario:

1. Request created with TAT = 16 hours (thresholds: 50%, 75%)
   Jobs scheduled:
   - tat-threshold1-REQ123 → fires at 8h, threshold=50
   - tat-threshold2-REQ123 → fires at 12h, threshold=75

2. Admin changes threshold from 50% → 55%

3. Old request jobs STILL fire at 8h (50%)
   ✅ BUT message correctly shows "50% elapsed" (from job data)
   ✅ No confusion because that request WAS scheduled at 50%

4. NEW requests created after config change:
   Jobs scheduled:
   - tat-threshold1-REQ456 → fires at 8.8h, threshold=55  ✅
   - tat-threshold2-REQ456 → fires at 12h, threshold=75

5. Message says "55% of TAT elapsed" ✅ CORRECT!

Result:

  • Existing jobs maintain their original thresholds (consistent)
  • New jobs use updated thresholds (respects config changes)
  • Messages always match actual threshold used

Case 2: User Approves Before Threshold

Scenario:

1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)

2. User approves at 40% elapsed

3. cancelTatJobs('REQ123', 'LEVEL456') is called:
   → Looks for: tat-threshold1-REQ123-LEVEL456  ✅ FOUND
   → Removes job  ✅ SUCCESS

4. No notification sent  ✅ CORRECT!

Result:

  • Generic names allow consistent cancellation
  • Works regardless of threshold percentage
  • No ambiguity in job identification

Case 3: User Approves After Threshold Fired

Scenario:

1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)

2. Job fires at 55% → notification sent

3. User approves at 60%

4. cancelTatJobs called:
   → Tries to cancel tat-threshold1-REQ123  
   → Job already processed and removed (removeOnComplete: true)
   → No error (gracefully handled)  ✅

5. Later jobs (threshold2, breach) are still cancelled  ✅

Result:

  • Already-fired jobs don't cause errors
  • Remaining jobs are still cancelled
  • System behaves correctly in all scenarios

Configuration Flow

Admin Updates Threshold

1. Admin changes "First TAT Threshold" from 50% → 55%
   ↓
2. Frontend sends: PUT /api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1
   Body: { configValue: '55' }
   ↓
3. Backend updates database:
   UPDATE admin_configurations 
   SET config_value = '55' 
   WHERE config_key = 'TAT_REMINDER_THRESHOLD_1'
   ↓
4. Backend clears config cache:
   clearConfigCache()  ✅
   ↓
5. Next request created:
   - getTatThresholds() → reads '55' from DB
   - Schedules job at 55% (8.8 hours for 16h TAT)
   - Job data: { threshold: 55 }
   ↓
6. Job fires at 55%:
   - Message: "55% of TAT elapsed"  ✅ CORRECT!

Database Impact

No Database Changes Required!

The admin_configurations table already has all required fields:

  • TAT_REMINDER_THRESHOLD_1 → First threshold (50% default)
  • TAT_REMINDER_THRESHOLD_2 → Second threshold (75% default)

Job Queue Data Structure

Old Job Data:

{
  "type": "tat50",
  "requestId": "...",
  "levelId": "...",
  "approverId": "..."
}

New Job Data:

{
  "type": "threshold1",
  "threshold": 55,
  "requestId": "...",
  "levelId": "...",
  "approverId": "..."
}

Testing Scenarios

Test 1: Change Threshold, Create New Request

# 1. Change threshold from 50% to 55%
curl -X PUT http://localhost:5000/api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1 \
  -H "Authorization: Bearer TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"configValue": "55"}'

# 2. Create new workflow request
# → Jobs scheduled at 55%, 75%, 100%

# 3. Wait for 55% elapsed
# → Notification says "55% of TAT elapsed" ✅

Test 2: Approve Before Threshold

# 1. Request created (TAT = 16 hours)
# → threshold1 scheduled at 8.8 hours (55%)

# 2. Approve at 6 hours (before 55%)
curl -X POST http://localhost:5000/api/v1/workflows/REQ123/approve/LEVEL456

# 3. cancelTatJobs is called internally
# → tat-threshold1-REQ123-LEVEL456 removed ✅
# → tat-threshold2-REQ123-LEVEL456 removed ✅
# → tat-breach-REQ123-LEVEL456 removed ✅

# 4. No notifications sent ✅

Test 3: Mixed Old and New Jobs

# 1. Create Request A with old threshold (50%)
# → Jobs use threshold=50

# 2. Admin changes to 55%

# 3. Create Request B with new threshold (55%)
# → Jobs use threshold=55

# 4. Both requests work correctly:
# → Request A fires at 50%, message says "50%" ✅
# → Request B fires at 55%, message says "55%" ✅

Summary

What Changed:

  1. Job names: tat50tat-threshold1 (generic)
  2. Job data: Now includes actual threshold percentage
  3. Messages: Dynamic based on threshold from job data
  4. Scheduling: Reads thresholds from database configuration
  5. Cache: Automatically cleared on config update

What Didn't Change:

  1. Database schema (admin_configurations already has all needed fields)
  2. API endpoints (no breaking changes)
  3. Frontend UI (works exactly the same)
  4. Cancellation logic (still works, just uses new names)

Benefits:

  1. No Job Name Conflicts: Generic names work for any percentage
  2. Accurate Messages: Always show actual threshold used
  3. Config Flexibility: Admin can change thresholds anytime
  4. Backward Compatible: Existing jobs complete normally
  5. Reliable Cancellation: Works regardless of threshold value
  6. Immediate Effect: New requests use updated thresholds immediately

Files Modified

  1. Re_Backend/src/services/configReader.service.ts - NEW (configuration reader)
  2. Re_Backend/src/services/tatScheduler.service.ts - Updated job scheduling
  3. Re_Backend/src/queues/tatProcessor.ts - Updated job processing
  4. Re_Backend/src/controllers/admin.controller.ts - Added cache clearing

Configuration Keys

Key Description Default Example
TAT_REMINDER_THRESHOLD_1 First warning threshold 50 55 (sends alert at 55%)
TAT_REMINDER_THRESHOLD_2 Critical warning threshold 75 80 (sends alert at 80%)
Breach Deadline reached (always 100%) 100 100 (non-configurable)

Example Timeline

TAT = 16 hours, Thresholds: 55%, 80%

Hour 0   ─────────────────────────────────────► Hour 16
         │                    │           │
         START              55% (8.8h)  80% (12.8h)   100%
                              │            │             │
                         threshold1    threshold2      breach
                         "55% elapsed"  "80% elapsed"  "BREACHED"
                         ⏳             ⚠️              ⏰

Result:

  • Job names don't hardcode percentages
  • Messages show actual configured thresholds
  • Cancellation works consistently
  • No edge cases or race conditions