9.4 KiB
9.4 KiB
Dynamic TAT Thresholds Implementation
Problem Statement
Original Issue
The TAT system had hardcoded threshold percentages (50%, 75%, 100%) which created several problems:
- Job Naming Conflict: Jobs were named using threshold percentages (
tat50-{reqId}-{levelId}) - Configuration Changes Didn't Apply: Changing threshold in settings didn't affect scheduled jobs
- Message Mismatch: Messages always said "50% elapsed" even if admin configured 55%
- Cancellation Issues: Uncertainty about whether jobs could be properly cancelled after config changes
Critical Edge Case Identified by User
Scenario:
1. Request created → TAT jobs scheduled:
- tat50-REQ123-LEVEL456 (fires at 8 hours, says "50% elapsed")
- tat75-REQ123-LEVEL456 (fires at 12 hours)
- tatBreach-REQ123-LEVEL456 (fires at 16 hours)
2. Admin changes threshold from 50% → 55%
3. User approves at 9 hours (after old 50% fired)
→ Job already fired with "50% elapsed" message ❌
→ But admin configured 55% ❌
→ Inconsistent!
4. Even if approval happens before old 50%:
→ System cancels `tat50-REQ123-LEVEL456` ✅
→ But message would still say "50%" (hardcoded) ❌
Solution: Generic Job Names + Dynamic Thresholds
1. Generic Job Naming
Changed from percentage-based to generic names:
Before:
tat50-{requestId}-{levelId}
tat75-{requestId}-{levelId}
tatBreach-{requestId}-{levelId}
After:
tat-threshold1-{requestId}-{levelId} // First threshold (configurable: 50%, 55%, 60%, etc.)
tat-threshold2-{requestId}-{levelId} // Second threshold (configurable: 75%, 80%, etc.)
tat-breach-{requestId}-{levelId} // Always 100% (deadline)
2. Store Threshold in Job Data
Instead of relying on job name, we store the actual percentage in job payload:
interface TatJobData {
type: 'threshold1' | 'threshold2' | 'breach';
threshold: number; // Actual % (e.g., 55, 80, 100)
requestId: string;
levelId: string;
approverId: string;
}
3. Dynamic Message Generation
Messages use the threshold from job data:
case 'threshold1':
message = `⏳ ${threshold}% of TAT elapsed for Request ${requestNumber}`;
// If threshold = 55, message says "55% of TAT elapsed" ✅
4. Configuration Cache Management
- Configurations are cached for 5 minutes (performance)
- Cache is automatically cleared when admin updates settings
- Next scheduled job will use new thresholds
How It Solves the Edge Cases
✅ Case 1: Config Changed After Job Creation
Scenario:
1. Request created with TAT = 16 hours (thresholds: 50%, 75%)
Jobs scheduled:
- tat-threshold1-REQ123 → fires at 8h, threshold=50
- tat-threshold2-REQ123 → fires at 12h, threshold=75
2. Admin changes threshold from 50% → 55%
3. Old request jobs STILL fire at 8h (50%)
✅ BUT message correctly shows "50% elapsed" (from job data)
✅ No confusion because that request WAS scheduled at 50%
4. NEW requests created after config change:
Jobs scheduled:
- tat-threshold1-REQ456 → fires at 8.8h, threshold=55 ✅
- tat-threshold2-REQ456 → fires at 12h, threshold=75
5. Message says "55% of TAT elapsed" ✅ CORRECT!
Result:
- ✅ Existing jobs maintain their original thresholds (consistent)
- ✅ New jobs use updated thresholds (respects config changes)
- ✅ Messages always match actual threshold used
✅ Case 2: User Approves Before Threshold
Scenario:
1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)
2. User approves at 40% elapsed
3. cancelTatJobs('REQ123', 'LEVEL456') is called:
→ Looks for: tat-threshold1-REQ123-LEVEL456 ✅ FOUND
→ Removes job ✅ SUCCESS
4. No notification sent ✅ CORRECT!
Result:
- ✅ Generic names allow consistent cancellation
- ✅ Works regardless of threshold percentage
- ✅ No ambiguity in job identification
✅ Case 3: User Approves After Threshold Fired
Scenario:
1. Job scheduled: tat-threshold1-REQ123 (fires at 55%)
2. Job fires at 55% → notification sent
3. User approves at 60%
4. cancelTatJobs called:
→ Tries to cancel tat-threshold1-REQ123
→ Job already processed and removed (removeOnComplete: true)
→ No error (gracefully handled) ✅
5. Later jobs (threshold2, breach) are still cancelled ✅
Result:
- ✅ Already-fired jobs don't cause errors
- ✅ Remaining jobs are still cancelled
- ✅ System behaves correctly in all scenarios
Configuration Flow
Admin Updates Threshold
1. Admin changes "First TAT Threshold" from 50% → 55%
↓
2. Frontend sends: PUT /api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1
Body: { configValue: '55' }
↓
3. Backend updates database:
UPDATE admin_configurations
SET config_value = '55'
WHERE config_key = 'TAT_REMINDER_THRESHOLD_1'
↓
4. Backend clears config cache:
clearConfigCache() ✅
↓
5. Next request created:
- getTatThresholds() → reads '55' from DB
- Schedules job at 55% (8.8 hours for 16h TAT)
- Job data: { threshold: 55 }
↓
6. Job fires at 55%:
- Message: "55% of TAT elapsed" ✅ CORRECT!
Database Impact
No Database Changes Required!
The admin_configurations table already has all required fields:
- ✅
TAT_REMINDER_THRESHOLD_1→ First threshold (50% default) - ✅
TAT_REMINDER_THRESHOLD_2→ Second threshold (75% default)
Job Queue Data Structure
Old Job Data:
{
"type": "tat50",
"requestId": "...",
"levelId": "...",
"approverId": "..."
}
New Job Data:
{
"type": "threshold1",
"threshold": 55,
"requestId": "...",
"levelId": "...",
"approverId": "..."
}
Testing Scenarios
Test 1: Change Threshold, Create New Request
# 1. Change threshold from 50% to 55%
curl -X PUT http://localhost:5000/api/v1/admin/configurations/TAT_REMINDER_THRESHOLD_1 \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"configValue": "55"}'
# 2. Create new workflow request
# → Jobs scheduled at 55%, 75%, 100%
# 3. Wait for 55% elapsed
# → Notification says "55% of TAT elapsed" ✅
Test 2: Approve Before Threshold
# 1. Request created (TAT = 16 hours)
# → threshold1 scheduled at 8.8 hours (55%)
# 2. Approve at 6 hours (before 55%)
curl -X POST http://localhost:5000/api/v1/workflows/REQ123/approve/LEVEL456
# 3. cancelTatJobs is called internally
# → tat-threshold1-REQ123-LEVEL456 removed ✅
# → tat-threshold2-REQ123-LEVEL456 removed ✅
# → tat-breach-REQ123-LEVEL456 removed ✅
# 4. No notifications sent ✅
Test 3: Mixed Old and New Jobs
# 1. Create Request A with old threshold (50%)
# → Jobs use threshold=50
# 2. Admin changes to 55%
# 3. Create Request B with new threshold (55%)
# → Jobs use threshold=55
# 4. Both requests work correctly:
# → Request A fires at 50%, message says "50%" ✅
# → Request B fires at 55%, message says "55%" ✅
Summary
What Changed:
- ✅ Job names:
tat50→tat-threshold1(generic) - ✅ Job data: Now includes actual threshold percentage
- ✅ Messages: Dynamic based on threshold from job data
- ✅ Scheduling: Reads thresholds from database configuration
- ✅ Cache: Automatically cleared on config update
What Didn't Change:
- ✅ Database schema (admin_configurations already has all needed fields)
- ✅ API endpoints (no breaking changes)
- ✅ Frontend UI (works exactly the same)
- ✅ Cancellation logic (still works, just uses new names)
Benefits:
- ✅ No Job Name Conflicts: Generic names work for any percentage
- ✅ Accurate Messages: Always show actual threshold used
- ✅ Config Flexibility: Admin can change thresholds anytime
- ✅ Backward Compatible: Existing jobs complete normally
- ✅ Reliable Cancellation: Works regardless of threshold value
- ✅ Immediate Effect: New requests use updated thresholds immediately
Files Modified
Re_Backend/src/services/configReader.service.ts- NEW (configuration reader)Re_Backend/src/services/tatScheduler.service.ts- Updated job schedulingRe_Backend/src/queues/tatProcessor.ts- Updated job processingRe_Backend/src/controllers/admin.controller.ts- Added cache clearing
Configuration Keys
| Key | Description | Default | Example |
|---|---|---|---|
TAT_REMINDER_THRESHOLD_1 |
First warning threshold | 50 | 55 (sends alert at 55%) |
TAT_REMINDER_THRESHOLD_2 |
Critical warning threshold | 75 | 80 (sends alert at 80%) |
| Breach | Deadline reached (always 100%) | 100 | 100 (non-configurable) |
Example Timeline
TAT = 16 hours, Thresholds: 55%, 80%
Hour 0 ─────────────────────────────────────► Hour 16
│ │ │
START 55% (8.8h) 80% (12.8h) 100%
│ │ │
threshold1 threshold2 breach
"55% elapsed" "80% elapsed" "BREACHED"
⏳ ⚠️ ⏰
Result:
- ✅ Job names don't hardcode percentages
- ✅ Messages show actual configured thresholds
- ✅ Cancellation works consistently
- ✅ No edge cases or race conditions