Re_Backend/docs/POSTGRES_JUSTIFICATION.md

114 lines
4.7 KiB
Markdown

# Why PostgreSQL Wins for "Royal Enfield Workflow"
## Executive Summary
For "Royal Enfield Workflow", **PostgreSQL is superior to MongoDB**.
The decision rests on **Reporting Speed** and **Deep Filtering capabilities**. Your workflow requires filtering by *Relationships* (Approvers, Departments), not just static data.
---
## 1. Complex Workflow Filters (The "My Tasks" Problem)
Users need specific views like "Requests waiting for me" or "Paused requests".
### A. "Requests Open For Me" (The Join Filter)
*Scenario: Show all requests where **I am the current approver**.*
#### PostgreSQL (Simple SQL `JOIN`)
Index usage is perfect. The DB jumps mainly to the few rows in `approval_levels` assigned to you.
```sql
SELECT r.id, r.status, r.created_at
FROM workflow_requests r
JOIN approval_levels al ON r.id = al.request_id
WHERE al.approver_id = 'USER_UUID_123'
AND al.status = 'PENDING'
ORDER BY r.created_at DESC;
```
#### MongoDB (Array Query + Sort Issue)
You must index inside an array. If you sort by "Date", Mongo often cannot use the index effectively for both the *array match* and the *sort*, leading to slow scans.
```javascript
db.requests.find({
"approvers": {
$elemMatch: {
userId: "USER_UUID_123",
status: "PENDING"
}
}
}).sort({ createdAt: -1 });
// WARNING: Performance degrades heavily if user has many historical requests
```
### B. "Paused & Resumed" History
*Scenario: Show requests that were previously Paused but are now Active (requires checking history).*
#### PostgreSQL (Audit Log Join)
You query the history table directly without loading the main request data until the match is found.
```sql
SELECT DISTINCT r.*
FROM workflow_requests r
JOIN audit_logs log ON r.id = log.request_id
WHERE log.action = 'PAUSED'
AND r.status = 'IN_PROGRESS';
```
#### MongoDB (The "Lookup" or "Bloat" Trade-off)
**Option 1: Lookups (Slow)**
You have to join the separate `audit_logs` collection for every request.
```javascript
db.requests.aggregate([
{ $match: { status: "IN_PROGRESS" } },
{
$lookup: {
from: "audit_logs",
localField: "_id",
foreignField: "requestId",
as: "history"
}
},
{ $match: { "history.action": "PAUSED" } }
]);
```
**Option 2: Embedding (Bloated)**
You store every log inside the Request document.
* *Result*: Your generic `db.requests.find({})` becomes 10x slower because it's dragging megabytes of history logs across the network for every result.
## 2. The Filter Nightmare: "Deep Filtering"
Users expect to slice-and-dice data freely. *Example: "Show requests initiated by users in the 'Sales' Department".*
* **Postgres (Cross-Table Filter)**:
```sql
SELECT * FROM workflow_requests r
JOIN users u ON r.initiator_id = u.id
WHERE u.department = 'Sales'
```
* **Result**: Instant. SQL simply filters the `users` table first (using an index on `department`) and then grabs the matching requests.
* **MongoDB (The "Lookup" Trap)**:
* `Department` is stored on the **User** document, not the Request.
* To filter Requests by "Department", you must `$lookup` (join) the User collection for *every single request* before you can filter them.
* *Alternative*: Copy `department` into every Request document.
* *Maintenance Cost*: If a user transfers from 'Sales' to 'Marketing', you must run a script to update all their historical requests, or your reports will be wrong.
## 3. Dashboard: The "Aggregation" Bottleneck
Your dashboard provides real-time insights (e.g., "Approver Efficiency," "TAT per Region").
* **Window Functions (SQL Superpower)**:
* *Requirement*: Rank dealers by "Average Approval Time" compared to their peers.
* *Postgres*: `RANK() OVER (PARTITION BY region ORDER BY avg_tat)` runs natively and instanly.
* *MongoDB*: Requires complex Aggregation Pipelines (`$setWindowFields`) that are memory-intensive and harder to optimize.
## 4. Audit & Compliance
* **Postgres**: Foreign Key constraints prevent "Orphaned Logs." You cannot delete a User if they are referenced in an Audit Log. This guarantees **legal traceability**.
* **MongoDB**: No constraints. Deleting a user can leave "Ghost Logs" (Referencing a null ID), breaking compliance reports.
## Summary Verdict
| Feature | PostgreSQL | MongoDB |
| :--- | :--- | :--- |
| **"Open For Me"** | **Simple Join** | **Complex Array Indexing** |
| **Dept/Region Filters** | **Simple Join** | **Slow Lookup** or **Duplicated Data** |
| **Ad-Hoc Reports** | **Flexible** | **Rigid** (Needs Indexes) |
| **Audit Compliance** | **Guaranteed** | **Risk of Orphaned Data** |
**Recommendation**: Stick with PostgreSQL.
The "Relational" nature of your reporting (Connecting Requests -> Users -> Departments -> Regions) is exactly what SQL was built to solve efficiently.