Re_Backend/docs/SYSTEM_ARCHITECTURE.md

6.1 KiB

Royal Enfield Workflow Management System - Technical Architecture Definition

1. Platform Overview

The Royal Enfield (RE) Workflow Management System is a resilient, horizontally scalable infrastructure designed to orchestrate complex internal business processes. It utilizes a decoupled, service-oriented architecture leveraging Node.js (TypeScript), MongoDB Atlas (v8), and Google Cloud Storage (GCS) to ensure high availability and performance across enterprise workflows.

This document focus exclusively on the core platform infrastructure and custom workflow engine, excluding legacy dealer claim modules.


2. Global Architecture & Ingress

A. High-Level System Architecture

graph TD
    User((User / Client))
    subgraph "Public Interface"
        Nginx[Nginx Reverse Proxy]
    end

    subgraph "Application Layer (Node.js)"
        Auth[Auth Middleware]
        Core[Workflow Service]
        Dynamic[Ad-hoc Logic]
        AI[Vertex AI Service]
        TAT[TAT Worker / BullMQ]
    end

    subgraph "Persistence & Infrastructure"
        Atlas[(MongoDB Atlas v8)]
        GCS_Bucket[GCS Bucket - Artifacts]
        GSM[Google Secret Manager]
        Redis[(Redis Cache)]
    end

    User --> Nginx
    Nginx --> Auth
    Auth --> Core
    Core --> Dynamic
    Core --> Atlas
    Core --> GCS_Bucket
    Core --> AI
    TAT --> Redis
    TAT --> Atlas
    Core --> GSM

B. Professional Entrance: Nginx Proxy

All incoming traffic is managed by Nginx, acting as the "Deployed Server" facade.

  • SSL Termination: Encrypts traffic at the edge.
  • Micro-caching: Caches static metadata to reduce load on Node.js.
  • Proxying: Strategically routes /api to the backend and serves the production React bundle for root requests.

C. Stateless Authentication (JWT + RBAC)

The platform follows a stateless security model:

  1. JWT Validation: auth.middleware.ts verifies signatures using secrets managed by Google Secret Manager (GSM).
  2. Context Enrichment: User identity is synchronized from the users collection in MongoDB Atlas.
  3. Granular RBAC: Access is governed by roles (ADMIN, MANAGEMENT, USER) and dynamic participant checks.

3. Background Processing & SLA Management (BullMQ)

At the heart of the platform's performance is the Asynchronous Task Engine powered by BullMQ and Redis.

A. TAT (Turnaround Time) Tracking Logic

Turnaround time is monitored per-level using a highly accurate calculation engine that accounts for:

  • Business Days/Hours: Weekend and holiday filtering via tatTimeUtils.ts.
  • Priority Multipliers: Scaling TAT for STANDARD vs EXPRESS requests.
  • Pause Impact: Snapshot-based SLA halting during business-case pauses.

B. TAT Worker Flow (Redis Backed)

graph TD
    Trigger[Request Assignment] --> Queue[tatQueue - BullMQ]
    Queue --> Redis[(Redis Cache)]
    Redis --> Worker[tatWorker.ts]
    Worker --> Processor[tatProcessor.mongo.ts]
    Processor --> Check{Threshold Reached?}
    Check -->|50/75%| Notify[Reminder Notification]
    Check -->|100%| Breach[Breach Alert + Escalation]

4. Multi-Channel Notification Dispatch Engine

The system ensures critical workflow events (Approvals, Breaches, Comments) reach users through three distinct synchronous and asynchronous channels.

A. Channel Orchestration

Managed by notification.service.ts, the engine handles:

  1. Real-time (Socket.io): Immediate UI updates via room-based events.
  2. Web Push (Vapid): Browser-level push notifications for offline users.
  3. Enterprise Email: Specialized services like emailNotification.service.ts dispatch templated HTML emails.

B. Notification Lifecycle

sequenceDiagram
    participant S as Service Layer
    participant N as Notification Service
    participant DB as MongoDB (NotificationModel)
    participant SK as Socket.io
    participant E as Email Service

    S->>N: Trigger Event (e.g. "Assignment")
    N->>DB: Persist Notification Record (Audit)
    N->>SK: broadcast(user:id, "notification:new")
    N->>E: dispatchAsync(EmailTemplate)
    DB-->>S: Success

5. Cloud-Native Storage & Assets (GCS)

The architecture treats Google Cloud Storage (GCS) as a first-class citizen for both operational and deployment data.

A. Deployment Artifact Architecture

  • Static Site Hosting: GCS stores the compiled frontend artifacts.
  • Production Secrets: Google Secret Manager ensures that no production passwords or API keys reside in the codebase.

B. Scalable Document Storage

  • Decoupling: Binaries are never stored in the database. MongoDB only stores the URI.
  • Privacy Mode: Documents are retrieved via Signed URLs with a configurable TTL.
  • Structure: requests/{requestNumber}/documents/

6. Real-time Collaboration (Socket.io)

Collaborative features like "Who else is viewing this request?" and "Instant Alerts" are powered by a persistent WebSocket layer.

  • Presence Tracking: A Map<requestId, Set<userId>> tracks online users per workflow request.
  • Room Logic: Users join specific "Rooms" based on their current active request view.
  • Bi-directional Sync: Frontend emits presence:join when entering a request page.

7. Intelligent Monitoring & Observability

The platform includes a dedicated monitoring stack for "Day 2" operations.

  • Metrics (Prometheus): Scrapes the /metrics endpoint provided by our Prometheus middleware.
  • Log Aggregation (Grafana Loki): promtail ships container logs to Loki for centralized debugging.
  • Alerting: Alertmanager triggers PagerDuty/Email alerts for critical system failures.
graph LR
    App[RE Backend] -->|Prometheus| P[Prometheus DB]
    App -->|Logs| L[Loki]
    P --> G[Grafana Dashboards]
    L --> G

8. Dynamic Workflow Flexibility

The "Custom Workflow" module provides logic for ad-hoc adjustments:

  1. Skip Approver: Bypasses a level while maintaining a forced audit reason.
  2. Ad-hoc Insertion: Inserts an approver level mid-flight, dynamically recalculating the downstream chain.