Re_Backend/DYNAMIC_TAT_THRESHOLDS.md

342 lines
9.4 KiB
Markdown

# Dynamic TAT Thresholds Implementation
## Problem Statement
### Original Issue
The TAT system had **hardcoded threshold percentages** (50%, 75%, 100%) which created several problems:
1. **Job Naming Conflict**: Jobs were named using threshold percentages (`tat50-{reqId}-{levelId}`)
2. **Configuration Changes Didn't Apply**: Changing threshold in settings didn't affect scheduled jobs
3. **Message Mismatch**: Messages always said "50% elapsed" even if admin configured 55%
4. **Cancellation Issues**: Uncertainty about whether jobs could be properly cancelled after config changes
### Critical Edge Case Identified by User
**Scenario:**
```
1. Request created → TAT jobs scheduled:
- tat50-REQ123-LEVEL456 (fires at 8 hours, says "50% elapsed")
- tat75-REQ123-LEVEL456 (fires at 12 hours)
- tatBreach-REQ123-LEVEL456 (fires at 16 hours)
2. Admin changes threshold from 50% → 55%
3. User approves at 9 hours (after old 50% fired)
→ Job already fired with "50% elapsed" message ❌
→ But admin configured 55% ❌
→ Inconsistent!
4. Even if approval happens before old 50%:
→ System cancels `tat50-REQ123-LEVEL456` ✅
→ But message would still say "50%" (hardcoded) ❌
```
---
## Solution: Generic Job Names + Dynamic Thresholds
### 1. **Generic Job Naming**
Changed from percentage-based to generic names:
**Before:**
```typescript
tat50-{requestId}-{levelId}
tat75-{requestId}-{levelId}
tatBreach-{requestId}-{levelId}
```
**After:**
```typescript
tat-threshold1-{requestId}-{levelId} // First threshold (configurable: 50%, 55%, 60%, etc.)
tat-threshold2-{requestId}-{levelId} // Second threshold (configurable: 75%, 80%, etc.)
tat-breach-{requestId}-{levelId} // Always 100% (deadline)
```
### 2. **Store Threshold in Job Data**
Instead of relying on job name, we store the actual percentage in job payload:
```typescript
interface TatJobData {
type: 'threshold1' | 'threshold2' | 'breach';
threshold: number; // Actual % (e.g., 55, 80, 100)
requestId: string;
levelId: string;
approverId: string;
}
```
### 3. **Dynamic Message Generation**
Messages use the threshold from job data:
```typescript
case 'threshold1':
message = `⏳ ${threshold}% of TAT elapsed for Request ${requestNumber}`;
// If threshold = 55, message says "55% of TAT elapsed" ✅
```
### 4. **Configuration Cache Management**
- Configurations are cached for 5 minutes (performance)
- Cache is **automatically cleared** when admin updates settings
- Next scheduled job will use new thresholds
---
## How It Solves the Edge Cases
### ✅ **Case 1: Config Changed After Job Creation**
**Scenario:**
```
1. Request created with TAT = 16 hours (thresholds: 50%, 75%)
Jobs scheduled:
- tat-threshold1-REQ123 → fires at 8h, threshold=50
- tat-threshold2-REQ123 → fires at 12h, threshold=75
2. Admin changes threshold from 50% → 55%
3. Old request jobs STILL fire at 8h (50%)
✅ BUT message correctly shows "50% elapsed" (from job data)
✅ No confusion because that request WAS scheduled at 50%
4. NEW requests created after config change:
Jobs scheduled:
- tat-threshold1-REQ456 → fires at 8.8h, threshold=55 ✅
- tat-threshold2-REQ456 → fires at 12h, threshold=75
5. Message says "55% of TAT elapsed" ✅ CORRECT!
```
**Result:**
- ✅ Existing jobs maintain their original thresholds (consistent)
- ✅ New jobs use updated thresholds (respects config changes)
- ✅ Messages always match actual threshold used
---
### ✅ **Case 2: User Approves Before Threshold**
**Scenario:**
```
1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)
2. User approves at 40% elapsed
3. cancelTatJobs('REQ123', 'LEVEL456') is called:
→ Looks for: tat-threshold1-REQ123-LEVEL456 ✅ FOUND
→ Removes job ✅ SUCCESS
4. No notification sent ✅ CORRECT!
```
**Result:**
- ✅ Generic names allow consistent cancellation
- ✅ Works regardless of threshold percentage
- ✅ No ambiguity in job identification
---
### ✅ **Case 3: User Approves After Threshold Fired**
**Scenario:**
```
1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)
2. Job fires at 55% → notification sent
3. User approves at 60%
4. cancelTatJobs called:
→ Tries to cancel tat-threshold1-REQ123
→ Job already processed and removed (removeOnComplete: true)
→ No error (gracefully handled) ✅
5. Later jobs (threshold2, breach) are still cancelled ✅
```
**Result:**
- ✅ Already-fired jobs don't cause errors
- ✅ Remaining jobs are still cancelled
- ✅ System behaves correctly in all scenarios
---
## Configuration Flow
### **Admin Updates Threshold**
```
1. Admin changes "First TAT Threshold" from 50% → 55%
2. Frontend sends: PUT /api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1
Body: { configValue: '55' }
3. Backend updates database:
UPDATE admin_configurations
SET config_value = '55'
WHERE config_key = 'TAT_REMINDER_THRESHOLD_1'
4. Backend clears config cache:
clearConfigCache() ✅
5. Next request created:
- getTatThresholds() → reads '55' from DB
- Schedules job at 55% (8.8 hours for 16h TAT)
- Job data: { threshold: 55 }
6. Job fires at 55%:
- Message: "55% of TAT elapsed" ✅ CORRECT!
```
---
## Database Impact
### **No Database Changes Required!**
The `admin_configurations` table already has all required fields:
-`TAT_REMINDER_THRESHOLD_1` → First threshold (50% default)
-`TAT_REMINDER_THRESHOLD_2` → Second threshold (75% default)
### **Job Queue Data Structure**
**Old Job Data:**
```json
{
"type": "tat50",
"requestId": "...",
"levelId": "...",
"approverId": "..."
}
```
**New Job Data:**
```json
{
"type": "threshold1",
"threshold": 55,
"requestId": "...",
"levelId": "...",
"approverId": "..."
}
```
---
## Testing Scenarios
### **Test 1: Change Threshold, Create New Request**
```bash
# 1. Change threshold from 50% to 55%
curl -X PUT http://localhost:5000/api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1 \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"configValue": "55"}'
# 2. Create new workflow request
# → Jobs scheduled at 55%, 75%, 100%
# 3. Wait for 55% elapsed
# → Notification says "55% of TAT elapsed" ✅
```
### **Test 2: Approve Before Threshold**
```bash
# 1. Request created (TAT = 16 hours)
# → threshold1 scheduled at 8.8 hours (55%)
# 2. Approve at 6 hours (before 55%)
curl -X POST http://localhost:5000/api/v1/workflows/REQ123/approve/LEVEL456
# 3. cancelTatJobs is called internally
# → tat-threshold1-REQ123-LEVEL456 removed ✅
# → tat-threshold2-REQ123-LEVEL456 removed ✅
# → tat-breach-REQ123-LEVEL456 removed ✅
# 4. No notifications sent ✅
```
### **Test 3: Mixed Old and New Jobs**
```bash
# 1. Create Request A with old threshold (50%)
# → Jobs use threshold=50
# 2. Admin changes to 55%
# 3. Create Request B with new threshold (55%)
# → Jobs use threshold=55
# 4. Both requests work correctly:
# → Request A fires at 50%, message says "50%" ✅
# → Request B fires at 55%, message says "55%" ✅
```
---
## Summary
### **What Changed:**
1. ✅ Job names: `tat50``tat-threshold1` (generic)
2. ✅ Job data: Now includes actual threshold percentage
3. ✅ Messages: Dynamic based on threshold from job data
4. ✅ Scheduling: Reads thresholds from database configuration
5. ✅ Cache: Automatically cleared on config update
### **What Didn't Change:**
1. ✅ Database schema (admin_configurations already has all needed fields)
2. ✅ API endpoints (no breaking changes)
3. ✅ Frontend UI (works exactly the same)
4. ✅ Cancellation logic (still works, just uses new names)
### **Benefits:**
1.**No Job Name Conflicts**: Generic names work for any percentage
2.**Accurate Messages**: Always show actual threshold used
3.**Config Flexibility**: Admin can change thresholds anytime
4.**Backward Compatible**: Existing jobs complete normally
5.**Reliable Cancellation**: Works regardless of threshold value
6.**Immediate Effect**: New requests use updated thresholds immediately
---
## Files Modified
1. `Re_Backend/src/services/configReader.service.ts` - **NEW** (configuration reader)
2. `Re_Backend/src/services/tatScheduler.service.ts` - Updated job scheduling
3. `Re_Backend/src/queues/tatProcessor.ts` - Updated job processing
4. `Re_Backend/src/controllers/admin.controller.ts` - Added cache clearing
---
## Configuration Keys
| Key | Description | Default | Example |
|-----|-------------|---------|---------|
| `TAT_REMINDER_THRESHOLD_1` | First warning threshold | 50 | 55 (sends alert at 55%) |
| `TAT_REMINDER_THRESHOLD_2` | Critical warning threshold | 75 | 80 (sends alert at 80%) |
| Breach | Deadline reached (always 100%) | 100 | 100 (non-configurable) |
---
## Example Timeline
**TAT = 16 hours, Thresholds: 55%, 80%**
```
Hour 0 ─────────────────────────────────────► Hour 16
│ │ │
START 55% (8.8h) 80% (12.8h) 100%
│ │ │
threshold1 threshold2 breach
"55% elapsed" "80% elapsed" "BREACHED"
⏳ ⚠️ ⏰
```
**Result:**
- ✅ Job names don't hardcode percentages
- ✅ Messages show actual configured thresholds
- ✅ Cancellation works consistently
- ✅ No edge cases or race conditions