Re_Backend/docs/SYSTEM_ARCHITECTURE.md

160 lines
6.1 KiB
Markdown

# Royal Enfield Workflow Management System - Technical Architecture Definition
## 1. Platform Overview
The Royal Enfield (RE) Workflow Management System is a resilient, horizontally scalable infrastructure designed to orchestrate complex internal business processes. It utilizes a decoupled, service-oriented architecture leveraging **Node.js (TypeScript)**, **MongoDB Atlas (v8)**, and **Google Cloud Storage (GCS)** to ensure high availability and performance across enterprise workflows.
This document focus exclusively on the core platform infrastructure and custom workflow engine, excluding legacy dealer claim modules.
---
## 2. Global Architecture & Ingress
### A. High-Level System Architecture
```mermaid
graph TD
User((User / Client))
subgraph "Public Interface"
Nginx[Nginx Reverse Proxy]
end
subgraph "Application Layer (Node.js)"
Auth[Auth Middleware]
Core[Workflow Service]
Dynamic[Ad-hoc Logic]
AI[Vertex AI Service]
TAT[TAT Worker / BullMQ]
end
subgraph "Persistence & Infrastructure"
Atlas[(MongoDB Atlas v8)]
GCS_Bucket[GCS Bucket - Artifacts]
GSM[Google Secret Manager]
Redis[(Redis Cache)]
end
User --> Nginx
Nginx --> Auth
Auth --> Core
Core --> Dynamic
Core --> Atlas
Core --> GCS_Bucket
Core --> AI
TAT --> Redis
TAT --> Atlas
Core --> GSM
```
### B. Professional Entrance: Nginx Proxy
All incoming traffic is managed by **Nginx**, acting as the "Deployed Server" facade.
- **SSL Termination**: Encrypts traffic at the edge.
- **Micro-caching**: Caches static metadata to reduce load on Node.js.
- **Proxying**: Strategically routes `/api` to the backend and serves the production React bundle for root requests.
### C. Stateless Authentication (JWT + RBAC)
The platform follows a stateless security model:
1. **JWT Validation**: `auth.middleware.ts` verifies signatures using secrets managed by **Google Secret Manager (GSM)**.
2. **Context Enrichment**: User identity is synchronized from the `users` collection in MongoDB Atlas.
3. **Granular RBAC**: Access is governed by roles (`ADMIN`, `MANAGEMENT`, `USER`) and dynamic participant checks.
---
## 3. Background Processing & SLA Management (BullMQ)
At the heart of the platform's performance is the **Asynchronous Task Engine** powered by **BullMQ** and **Redis**.
### A. TAT (Turnaround Time) Tracking Logic
Turnaround time is monitored per-level using a highly accurate calculation engine that accounts for:
- **Business Days/Hours**: Weekend and holiday filtering via `tatTimeUtils.ts`.
- **Priority Multipliers**: Scaling TAT for `STANDARD` vs `EXPRESS` requests.
- **Pause Impact**: Snapshot-based SLA halting during business-case pauses.
### B. TAT Worker Flow (Redis Backed)
```mermaid
graph TD
Trigger[Request Assignment] --> Queue[tatQueue - BullMQ]
Queue --> Redis[(Redis Cache)]
Redis --> Worker[tatWorker.ts]
Worker --> Processor[tatProcessor.mongo.ts]
Processor --> Check{Threshold Reached?}
Check -->|50/75%| Notify[Reminder Notification]
Check -->|100%| Breach[Breach Alert + Escalation]
```
---
## 4. Multi-Channel Notification Dispatch Engine
The system ensures critical workflow events (Approvals, Breaches, Comments) reach users through three distinct synchronous and asynchronous channels.
### A. Channel Orchestration
Managed by `notification.service.ts`, the engine handles:
1. **Real-time (Socket.io)**: Immediate UI updates via room-based events.
2. **Web Push (Vapid)**: Browser-level push notifications for offline users.
3. **Enterprise Email**: Specialized services like `emailNotification.service.ts` dispatch templated HTML emails.
### B. Notification Lifecycle
```mermaid
sequenceDiagram
participant S as Service Layer
participant N as Notification Service
participant DB as MongoDB (NotificationModel)
participant SK as Socket.io
participant E as Email Service
S->>N: Trigger Event (e.g. "Assignment")
N->>DB: Persist Notification Record (Audit)
N->>SK: broadcast(user:id, "notification:new")
N->>E: dispatchAsync(EmailTemplate)
DB-->>S: Success
```
---
## 5. Cloud-Native Storage & Assets (GCS)
The architecture treats **Google Cloud Storage (GCS)** as a first-class citizen for both operational and deployment data.
### A. Deployment Artifact Architecture
- **Static Site Hosting**: GCS stores the compiled frontend artifacts.
- **Production Secrets**: `Google Secret Manager` ensures that no production passwords or API keys reside in the codebase.
### B. Scalable Document Storage
- **Decoupling**: Binaries are never stored in the database. MongoDB only stores the URI.
- **Privacy Mode**: Documents are retrieved via **Signed URLs** with a configurable TTL.
- **Structure**: `requests/{requestNumber}/documents/`
---
## 6. Real-time Collaboration (Socket.io)
Collaborative features like "Who else is viewing this request?" and "Instant Alerts" are powered by a persistent WebSocket layer.
- **Presence Tracking**: A `Map<requestId, Set<userId>>` tracks online users per workflow request.
- **Room Logic**: Users join specific "Rooms" based on their current active request view.
- **Bi-directional Sync**: Frontend emits `presence:join` when entering a request page.
---
## 7. Intelligent Monitoring & Observability
The platform includes a dedicated monitoring stack for "Day 2" operations.
- **Metrics (Prometheus)**: Scrapes the `/metrics` endpoint provided by our Prometheus middleware.
- **Log Aggregation (Grafana Loki)**: `promtail` ships container logs to Loki for centralized debugging.
- **Alerting**: **Alertmanager** triggers PagerDuty/Email alerts for critical system failures.
```mermaid
graph LR
App[RE Backend] -->|Prometheus| P[Prometheus DB]
App -->|Logs| L[Loki]
P --> G[Grafana Dashboards]
L --> G
```
---
## 8. Dynamic Workflow Flexibility
The "Custom Workflow" module provides logic for ad-hoc adjustments:
1. **Skip Approver**: Bypasses a level while maintaining a forced audit reason.
2. **Ad-hoc Insertion**: Inserts an approver level mid-flight, dynamically recalculating the downstream chain.