342 lines
9.4 KiB
Markdown
342 lines
9.4 KiB
Markdown
# Dynamic TAT Thresholds Implementation
|
|
|
|
## Problem Statement
|
|
|
|
### Original Issue
|
|
The TAT system had **hardcoded threshold percentages** (50%, 75%, 100%) which created several problems:
|
|
|
|
1. **Job Naming Conflict**: Jobs were named using threshold percentages (`tat50-{reqId}-{levelId}`)
|
|
2. **Configuration Changes Didn't Apply**: Changing threshold in settings didn't affect scheduled jobs
|
|
3. **Message Mismatch**: Messages always said "50% elapsed" even if admin configured 55%
|
|
4. **Cancellation Issues**: Uncertainty about whether jobs could be properly cancelled after config changes
|
|
|
|
### Critical Edge Case Identified by User
|
|
|
|
**Scenario:**
|
|
```
|
|
1. Request created → TAT jobs scheduled:
|
|
- tat50-REQ123-LEVEL456 (fires at 8 hours, says "50% elapsed")
|
|
- tat75-REQ123-LEVEL456 (fires at 12 hours)
|
|
- tatBreach-REQ123-LEVEL456 (fires at 16 hours)
|
|
|
|
2. Admin changes threshold from 50% → 55%
|
|
|
|
3. User approves at 9 hours (after old 50% fired)
|
|
→ Job already fired with "50% elapsed" message ❌
|
|
→ But admin configured 55% ❌
|
|
→ Inconsistent!
|
|
|
|
4. Even if approval happens before old 50%:
|
|
→ System cancels `tat50-REQ123-LEVEL456` ✅
|
|
→ But message would still say "50%" (hardcoded) ❌
|
|
```
|
|
|
|
---
|
|
|
|
## Solution: Generic Job Names + Dynamic Thresholds
|
|
|
|
### 1. **Generic Job Naming**
|
|
Changed from percentage-based to generic names:
|
|
|
|
**Before:**
|
|
```typescript
|
|
tat50-{requestId}-{levelId}
|
|
tat75-{requestId}-{levelId}
|
|
tatBreach-{requestId}-{levelId}
|
|
```
|
|
|
|
**After:**
|
|
```typescript
|
|
tat-threshold1-{requestId}-{levelId} // First threshold (configurable: 50%, 55%, 60%, etc.)
|
|
tat-threshold2-{requestId}-{levelId} // Second threshold (configurable: 75%, 80%, etc.)
|
|
tat-breach-{requestId}-{levelId} // Always 100% (deadline)
|
|
```
|
|
|
|
### 2. **Store Threshold in Job Data**
|
|
Instead of relying on job name, we store the actual percentage in job payload:
|
|
|
|
```typescript
|
|
interface TatJobData {
|
|
type: 'threshold1' | 'threshold2' | 'breach';
|
|
threshold: number; // Actual % (e.g., 55, 80, 100)
|
|
requestId: string;
|
|
levelId: string;
|
|
approverId: string;
|
|
}
|
|
```
|
|
|
|
### 3. **Dynamic Message Generation**
|
|
Messages use the threshold from job data:
|
|
|
|
```typescript
|
|
case 'threshold1':
|
|
message = `⏳ ${threshold}% of TAT elapsed for Request ${requestNumber}`;
|
|
// If threshold = 55, message says "55% of TAT elapsed" ✅
|
|
```
|
|
|
|
### 4. **Configuration Cache Management**
|
|
- Configurations are cached for 5 minutes (performance)
|
|
- Cache is **automatically cleared** when admin updates settings
|
|
- Next scheduled job will use new thresholds
|
|
|
|
---
|
|
|
|
## How It Solves the Edge Cases
|
|
|
|
### ✅ **Case 1: Config Changed After Job Creation**
|
|
|
|
**Scenario:**
|
|
```
|
|
1. Request created with TAT = 16 hours (thresholds: 50%, 75%)
|
|
Jobs scheduled:
|
|
- tat-threshold1-REQ123 → fires at 8h, threshold=50
|
|
- tat-threshold2-REQ123 → fires at 12h, threshold=75
|
|
|
|
2. Admin changes threshold from 50% → 55%
|
|
|
|
3. Old request jobs STILL fire at 8h (50%)
|
|
✅ BUT message correctly shows "50% elapsed" (from job data)
|
|
✅ No confusion because that request WAS scheduled at 50%
|
|
|
|
4. NEW requests created after config change:
|
|
Jobs scheduled:
|
|
- tat-threshold1-REQ456 → fires at 8.8h, threshold=55 ✅
|
|
- tat-threshold2-REQ456 → fires at 12h, threshold=75
|
|
|
|
5. Message says "55% of TAT elapsed" ✅ CORRECT!
|
|
```
|
|
|
|
**Result:**
|
|
- ✅ Existing jobs maintain their original thresholds (consistent)
|
|
- ✅ New jobs use updated thresholds (respects config changes)
|
|
- ✅ Messages always match actual threshold used
|
|
|
|
---
|
|
|
|
### ✅ **Case 2: User Approves Before Threshold**
|
|
|
|
**Scenario:**
|
|
```
|
|
1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)
|
|
|
|
2. User approves at 40% elapsed
|
|
|
|
3. cancelTatJobs('REQ123', 'LEVEL456') is called:
|
|
→ Looks for: tat-threshold1-REQ123-LEVEL456 ✅ FOUND
|
|
→ Removes job ✅ SUCCESS
|
|
|
|
4. No notification sent ✅ CORRECT!
|
|
```
|
|
|
|
**Result:**
|
|
- ✅ Generic names allow consistent cancellation
|
|
- ✅ Works regardless of threshold percentage
|
|
- ✅ No ambiguity in job identification
|
|
|
|
---
|
|
|
|
### ✅ **Case 3: User Approves After Threshold Fired**
|
|
|
|
**Scenario:**
|
|
```
|
|
1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)
|
|
|
|
2. Job fires at 55% → notification sent
|
|
|
|
3. User approves at 60%
|
|
|
|
4. cancelTatJobs called:
|
|
→ Tries to cancel tat-threshold1-REQ123
|
|
→ Job already processed and removed (removeOnComplete: true)
|
|
→ No error (gracefully handled) ✅
|
|
|
|
5. Later jobs (threshold2, breach) are still cancelled ✅
|
|
```
|
|
|
|
**Result:**
|
|
- ✅ Already-fired jobs don't cause errors
|
|
- ✅ Remaining jobs are still cancelled
|
|
- ✅ System behaves correctly in all scenarios
|
|
|
|
---
|
|
|
|
## Configuration Flow
|
|
|
|
### **Admin Updates Threshold**
|
|
|
|
```
|
|
1. Admin changes "First TAT Threshold" from 50% → 55%
|
|
↓
|
|
2. Frontend sends: PUT /api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1
|
|
Body: { configValue: '55' }
|
|
↓
|
|
3. Backend updates database:
|
|
UPDATE admin_configurations
|
|
SET config_value = '55'
|
|
WHERE config_key = 'TAT_REMINDER_THRESHOLD_1'
|
|
↓
|
|
4. Backend clears config cache:
|
|
clearConfigCache() ✅
|
|
↓
|
|
5. Next request created:
|
|
- getTatThresholds() → reads '55' from DB
|
|
- Schedules job at 55% (8.8 hours for 16h TAT)
|
|
- Job data: { threshold: 55 }
|
|
↓
|
|
6. Job fires at 55%:
|
|
- Message: "55% of TAT elapsed" ✅ CORRECT!
|
|
```
|
|
|
|
---
|
|
|
|
## Database Impact
|
|
|
|
### **No Database Changes Required!**
|
|
|
|
The `admin_configurations` table already has all required fields:
|
|
- ✅ `TAT_REMINDER_THRESHOLD_1` → First threshold (50% default)
|
|
- ✅ `TAT_REMINDER_THRESHOLD_2` → Second threshold (75% default)
|
|
|
|
### **Job Queue Data Structure**
|
|
|
|
**Old Job Data:**
|
|
```json
|
|
{
|
|
"type": "tat50",
|
|
"requestId": "...",
|
|
"levelId": "...",
|
|
"approverId": "..."
|
|
}
|
|
```
|
|
|
|
**New Job Data:**
|
|
```json
|
|
{
|
|
"type": "threshold1",
|
|
"threshold": 55,
|
|
"requestId": "...",
|
|
"levelId": "...",
|
|
"approverId": "..."
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Scenarios
|
|
|
|
### **Test 1: Change Threshold, Create New Request**
|
|
|
|
```bash
|
|
# 1. Change threshold from 50% to 55%
|
|
curl -X PUT http://localhost:5000/api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1 \
|
|
-H "Authorization: Bearer TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"configValue": "55"}'
|
|
|
|
# 2. Create new workflow request
|
|
# → Jobs scheduled at 55%, 75%, 100%
|
|
|
|
# 3. Wait for 55% elapsed
|
|
# → Notification says "55% of TAT elapsed" ✅
|
|
```
|
|
|
|
### **Test 2: Approve Before Threshold**
|
|
|
|
```bash
|
|
# 1. Request created (TAT = 16 hours)
|
|
# → threshold1 scheduled at 8.8 hours (55%)
|
|
|
|
# 2. Approve at 6 hours (before 55%)
|
|
curl -X POST http://localhost:5000/api/v1/workflows/REQ123/approve/LEVEL456
|
|
|
|
# 3. cancelTatJobs is called internally
|
|
# → tat-threshold1-REQ123-LEVEL456 removed ✅
|
|
# → tat-threshold2-REQ123-LEVEL456 removed ✅
|
|
# → tat-breach-REQ123-LEVEL456 removed ✅
|
|
|
|
# 4. No notifications sent ✅
|
|
```
|
|
|
|
### **Test 3: Mixed Old and New Jobs**
|
|
|
|
```bash
|
|
# 1. Create Request A with old threshold (50%)
|
|
# → Jobs use threshold=50
|
|
|
|
# 2. Admin changes to 55%
|
|
|
|
# 3. Create Request B with new threshold (55%)
|
|
# → Jobs use threshold=55
|
|
|
|
# 4. Both requests work correctly:
|
|
# → Request A fires at 50%, message says "50%" ✅
|
|
# → Request B fires at 55%, message says "55%" ✅
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
### **What Changed:**
|
|
1. ✅ Job names: `tat50` → `tat-threshold1` (generic)
|
|
2. ✅ Job data: Now includes actual threshold percentage
|
|
3. ✅ Messages: Dynamic based on threshold from job data
|
|
4. ✅ Scheduling: Reads thresholds from database configuration
|
|
5. ✅ Cache: Automatically cleared on config update
|
|
|
|
### **What Didn't Change:**
|
|
1. ✅ Database schema (admin_configurations already has all needed fields)
|
|
2. ✅ API endpoints (no breaking changes)
|
|
3. ✅ Frontend UI (works exactly the same)
|
|
4. ✅ Cancellation logic (still works, just uses new names)
|
|
|
|
### **Benefits:**
|
|
1. ✅ **No Job Name Conflicts**: Generic names work for any percentage
|
|
2. ✅ **Accurate Messages**: Always show actual threshold used
|
|
3. ✅ **Config Flexibility**: Admin can change thresholds anytime
|
|
4. ✅ **Backward Compatible**: Existing jobs complete normally
|
|
5. ✅ **Reliable Cancellation**: Works regardless of threshold value
|
|
6. ✅ **Immediate Effect**: New requests use updated thresholds immediately
|
|
|
|
---
|
|
|
|
## Files Modified
|
|
|
|
1. `Re_Backend/src/services/configReader.service.ts` - **NEW** (configuration reader)
|
|
2. `Re_Backend/src/services/tatScheduler.service.ts` - Updated job scheduling
|
|
3. `Re_Backend/src/queues/tatProcessor.ts` - Updated job processing
|
|
4. `Re_Backend/src/controllers/admin.controller.ts` - Added cache clearing
|
|
|
|
---
|
|
|
|
## Configuration Keys
|
|
|
|
| Key | Description | Default | Example |
|
|
|-----|-------------|---------|---------|
|
|
| `TAT_REMINDER_THRESHOLD_1` | First warning threshold | 50 | 55 (sends alert at 55%) |
|
|
| `TAT_REMINDER_THRESHOLD_2` | Critical warning threshold | 75 | 80 (sends alert at 80%) |
|
|
| Breach | Deadline reached (always 100%) | 100 | 100 (non-configurable) |
|
|
|
|
---
|
|
|
|
## Example Timeline
|
|
|
|
**TAT = 16 hours, Thresholds: 55%, 80%**
|
|
|
|
```
|
|
Hour 0 ─────────────────────────────────────► Hour 16
|
|
│ │ │
|
|
START 55% (8.8h) 80% (12.8h) 100%
|
|
│ │ │
|
|
threshold1 threshold2 breach
|
|
"55% elapsed" "80% elapsed" "BREACHED"
|
|
⏳ ⚠️ ⏰
|
|
```
|
|
|
|
**Result:**
|
|
- ✅ Job names don't hardcode percentages
|
|
- ✅ Messages show actual configured thresholds
|
|
- ✅ Cancellation works consistently
|
|
- ✅ No edge cases or race conditions
|
|
|