# RE Workflow Monitoring Stack Complete monitoring solution with **Grafana**, **Prometheus**, **Loki**, and **Promtail** for the RE Workflow Management System. ## 🏗️ Architecture ``` ┌────────────────────────────────────────────────────────────────────────┐ │ RE Workflow System │ ├────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Node.js API │────│ PostgreSQL │────│ Redis │ │ │ │ (Port 5000) │ │ (Port 5432) │ │ (Port 6379) │ │ │ └────────┬────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ /metrics endpoint │ │ │ Log files (./logs/) │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Monitoring Stack │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ │ │ Prometheus │──│ Loki │──│ Promtail │ │ │ │ │ │ (Port 9090)│ │ (Port 3100) │ │ (Collects log files) │ │ │ │ │ └──────┬──────┘ └──────┬──────┘ └─────────────────────────┘ │ │ │ │ │ │ │ │ │ │ └────────┬───────┘ │ │ │ │ ▼ │ │ │ │ ┌─────────────────┐ │ │ │ │ │ Grafana │ │ │ │ │ │ (Port 3001) │◄── Pre-configured Dashboards │ │ │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ └────────────────────────────────────────────────────────────────────────┘ ``` ## 📦 What's Included The monitoring stack includes: - **Redis** - In-memory data store for BullMQ job queues - **Prometheus** - Metrics collection and storage - **Grafana** - Visualization and dashboards - **Loki** - Log aggregation - **Promtail** - Log shipping agent - **Node Exporter** - Host system metrics - **Redis Exporter** - Redis server metrics - **Alertmanager** - Alert routing and notifications ## 🚀 Quick Start ### Prerequisites - **Docker Desktop** installed and running - **WSL2** enabled (recommended for Windows) - Backend API running on port 5000 ### Step 1: Start Monitoring Stack ```powershell # Navigate to monitoring folder cd C:\Laxman\Royal_Enfield\Re_Backend\monitoring # Start all monitoring services docker-compose -f docker-compose.monitoring.yml up -d # Check status docker ps ``` ### Step 2: Configure Backend Environment Add these to your backend `.env` file: ```env # Loki configuration (for direct log shipping from Winston) LOKI_HOST=http://localhost:3100 # Optional: Basic auth if enabled # LOKI_USER=your_username # LOKI_PASSWORD=your_password ``` ### Step 3: Access Dashboards | Service | URL | Credentials | |---------|-----|-------------| | **Grafana** | http://localhost:3001 | admin / REWorkflow@2024 | | **Prometheus** | http://localhost:9090 | - | | **Loki** | http://localhost:3100 | - | | **Alertmanager** | http://localhost:9093 | - | ## 📊 Available Dashboards ### **RE Workflow Overview** (Enhanced!) **URL**: http://localhost:3001/d/re-workflow-overview **Sections:** 1. **📊 API Overview** - Request rate, error rate, response times - HTTP status codes distribution 2. **🔴 Redis & Queue Status** (NEW!) - Redis connection status (Up/Down) - Redis active connections - Redis memory usage - TAT Queue waiting/failed jobs - Pause/Resume Queue waiting/failed jobs - All queues job status timeline - Redis commands rate 3. **💻 System Resources** (NEW!) - System CPU Usage (gauge) - System Memory Usage (gauge) - System Disk Usage (gauge) - Disk Space Left (GB available) 4. **🔄 Business Metrics** - Workflow operations - TAT breaches - Node.js process metrics **Refresh Rate**: Auto-refresh every 30 seconds ### 1. RE Workflow Overview Pre-configured dashboard with: - **API Metrics**: Request rate, error rate, latency percentiles - **Logs Overview**: Error count, warnings, TAT breaches - **Node.js Runtime**: Memory usage, event loop lag, CPU ### 2. Custom LogQL Queries | Purpose | Query | |---------|-------| | All errors | `{app="re-workflow"} \| json \| level="error"` | | TAT breaches | `{app="re-workflow"} \| json \| tatEvent="breached"` | | Auth failures | `{app="re-workflow"} \| json \| authEvent="auth_failure"` | | Slow requests (>3s) | `{app="re-workflow"} \| json \| duration>3000` | | By user | `{app="re-workflow"} \| json \| userId="USER-ID"` | | By request | `{app="re-workflow"} \| json \| requestId="REQ-XXX"` | ### 3. PromQL Queries (Prometheus) | Purpose | Query | |---------|-------| | Request rate | `rate(http_requests_total{job="re-workflow-backend"}[5m])` | | Error rate | `rate(http_request_errors_total[5m]) / rate(http_requests_total[5m])` | | P95 latency | `histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))` | | Memory usage | `process_resident_memory_bytes{job="re-workflow-backend"}` | | Event loop lag | `nodejs_eventloop_lag_seconds{job="re-workflow-backend"}` | ## 📁 File Structure ``` monitoring/ ├── docker-compose.monitoring.yml # Main compose file ├── prometheus/ │ ├── prometheus.yml # Prometheus configuration │ └── alert.rules.yml # Alert rules ├── loki/ │ └── loki-config.yml # Loki configuration ├── promtail/ │ └── promtail-config.yml # Promtail log shipper config ├── alertmanager/ │ └── alertmanager.yml # Alert notification config └── grafana/ ├── provisioning/ │ ├── datasources/ │ │ └── datasources.yml # Auto-configure data sources │ └── dashboards/ │ └── dashboards.yml # Dashboard provisioning └── dashboards/ └── re-workflow-overview.json # Pre-built dashboard ``` ## 🔧 Configuration ### Prometheus Scrape Targets Edit `prometheus/prometheus.yml` to add/modify scrape targets: ```yaml scrape_configs: - job_name: 're-workflow-backend' static_configs: # For local development (backend outside Docker) - targets: ['host.docker.internal:5000'] # For Docker deployment (backend in Docker) # - targets: ['re_workflow_backend:5000'] ``` ### Log Retention Edit `loki/loki-config.yml`: ```yaml limits_config: retention_period: 15d # Adjust retention period ``` ### Alert Notifications Edit `alertmanager/alertmanager.yml` to configure: - **Email** notifications - **Slack** webhooks - **Custom** webhook endpoints ## 🛠️ Common Commands ```powershell # Start services docker-compose -f docker-compose.monitoring.yml up -d # Stop services docker-compose -f docker-compose.monitoring.yml down # View logs docker-compose -f docker-compose.monitoring.yml logs -f # View specific service logs docker-compose -f docker-compose.monitoring.yml logs -f grafana # Restart a service docker-compose -f docker-compose.monitoring.yml restart prometheus # Check service health docker ps # Remove all data (fresh start) docker-compose -f docker-compose.monitoring.yml down -v ``` ## ⚡ Metrics Exposed by Backend The backend exposes these metrics at `/metrics`: ### HTTP Metrics - `http_requests_total` - Total HTTP requests (by method, route, status) - `http_request_duration_seconds` - Request latency histogram - `http_request_errors_total` - Error count (4xx, 5xx) - `http_active_connections` - Current active connections ### Business Metrics - `tat_breaches_total` - TAT breach events - `pending_workflows_count` - Pending workflow gauge - `workflow_operations_total` - Workflow operation count - `auth_events_total` - Authentication events ### Node.js Runtime - `nodejs_heap_size_*` - Heap memory metrics - `nodejs_eventloop_lag_*` - Event loop lag - `process_cpu_*` - CPU usage - `process_resident_memory_bytes` - RSS memory ## 🔒 Security Notes 1. **Change default passwords** in production 2. **Enable TLS** for external access 3. **Configure firewall** to restrict access to monitoring ports 4. **Use reverse proxy** (nginx) for HTTPS ## 🐛 Troubleshooting ### Prometheus can't scrape backend 1. Ensure backend is running on port 5000 2. Check `/metrics` endpoint: `curl http://localhost:5000/metrics` 3. For Docker: use `host.docker.internal:5000` ### Logs not appearing in Loki 1. Check Promtail logs: `docker logs re_promtail` 2. Verify log file path in `promtail-config.yml` 3. Ensure backend has `LOKI_HOST` configured ### Grafana dashboards empty 1. Wait 30-60 seconds for data collection 2. Check data source configuration in Grafana 3. Verify time range selection ### Docker memory issues ```powershell # Increase Docker Desktop memory allocation # Settings → Resources → Memory → 4GB+ ``` ## 📞 Support For issues with the monitoring stack: 1. Check container logs: `docker logs ` 2. Verify configuration files syntax 3. Ensure Docker Desktop is running with sufficient resources