| .. | ||
| alertmanager | ||
| grafana | ||
| loki | ||
| prometheus | ||
| promtail | ||
| .env.example | ||
| .gitignore | ||
| DASHBOARD_METRICS_REFERENCE.md | ||
| delete | ||
| docker-compose.monitoring.yml | ||
| README.md | ||
| REDIS_MIGRATION.md | ||
| start-monitoring.bat | ||
| stop-monitoring.bat | ||
RE Workflow Monitoring Stack
Complete monitoring solution with Grafana, Prometheus, Loki, and Promtail for the RE Workflow Management System.
🏗️ Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ RE Workflow System │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Node.js API │────│ PostgreSQL │────│ Redis │ │
│ │ (Port 5000) │ │ (Port 5432) │ │ (Port 6379) │ │
│ └────────┬────────┘ └─────────────────┘ └─────────────────┘ │
│ │ │
│ │ /metrics endpoint │
│ │ Log files (./logs/) │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Monitoring Stack │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │
│ │ │ Prometheus │──│ Loki │──│ Promtail │ │ │
│ │ │ (Port 9090)│ │ (Port 3100) │ │ (Collects log files) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ └────────┬───────┘ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Grafana │ │ │
│ │ │ (Port 3001) │◄── Pre-configured Dashboards │ │
│ │ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
📦 What's Included
The monitoring stack includes:
- Redis - In-memory data store for BullMQ job queues
- Prometheus - Metrics collection and storage
- Grafana - Visualization and dashboards
- Loki - Log aggregation
- Promtail - Log shipping agent
- Node Exporter - Host system metrics
- Redis Exporter - Redis server metrics
- Alertmanager - Alert routing and notifications
🚀 Quick Start
Prerequisites
- Docker Desktop installed and running
- WSL2 enabled (recommended for Windows)
- Backend API running on port 5000
Step 1: Start Monitoring Stack
# Navigate to monitoring folder
cd C:\Laxman\Royal_Enfield\Re_Backend\monitoring
# Start all monitoring services
docker-compose -f docker-compose.monitoring.yml up -d
# Check status
docker ps
Step 2: Configure Backend Environment
Add these to your backend .env file:
# Loki configuration (for direct log shipping from Winston)
LOKI_HOST=http://localhost:3100
# Optional: Basic auth if enabled
# LOKI_USER=your_username
# LOKI_PASSWORD=your_password
Step 3: Access Dashboards
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3001 | admin / REWorkflow@2024 |
| Prometheus | http://localhost:9090 | - |
| Loki | http://localhost:3100 | - |
| Alertmanager | http://localhost:9093 | - |
📊 Available Dashboards
RE Workflow Overview (Enhanced!)
URL: http://localhost:3001/d/re-workflow-overview
Sections:
-
📊 API Overview
- Request rate, error rate, response times
- HTTP status codes distribution
-
🔴 Redis & Queue Status (NEW!)
- Redis connection status (Up/Down)
- Redis active connections
- Redis memory usage
- TAT Queue waiting/failed jobs
- Pause/Resume Queue waiting/failed jobs
- All queues job status timeline
- Redis commands rate
-
💻 System Resources (NEW!)
- System CPU Usage (gauge)
- System Memory Usage (gauge)
- System Disk Usage (gauge)
- Disk Space Left (GB available)
-
🔄 Business Metrics
- Workflow operations
- TAT breaches
- Node.js process metrics
Refresh Rate: Auto-refresh every 30 seconds
1. RE Workflow Overview
Pre-configured dashboard with:
- API Metrics: Request rate, error rate, latency percentiles
- Logs Overview: Error count, warnings, TAT breaches
- Node.js Runtime: Memory usage, event loop lag, CPU
2. Custom LogQL Queries
| Purpose | Query |
|---|---|
| All errors | {app="re-workflow"} | json | level="error" |
| TAT breaches | {app="re-workflow"} | json | tatEvent="breached" |
| Auth failures | {app="re-workflow"} | json | authEvent="auth_failure" |
| Slow requests (>3s) | {app="re-workflow"} | json | duration>3000 |
| By user | {app="re-workflow"} | json | userId="USER-ID" |
| By request | {app="re-workflow"} | json | requestId="REQ-XXX" |
3. PromQL Queries (Prometheus)
| Purpose | Query |
|---|---|
| Request rate | rate(http_requests_total{job="re-workflow-backend"}[5m]) |
| Error rate | rate(http_request_errors_total[5m]) / rate(http_requests_total[5m]) |
| P95 latency | histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) |
| Memory usage | process_resident_memory_bytes{job="re-workflow-backend"} |
| Event loop lag | nodejs_eventloop_lag_seconds{job="re-workflow-backend"} |
📁 File Structure
monitoring/
├── docker-compose.monitoring.yml # Main compose file
├── prometheus/
│ ├── prometheus.yml # Prometheus configuration
│ └── alert.rules.yml # Alert rules
├── loki/
│ └── loki-config.yml # Loki configuration
├── promtail/
│ └── promtail-config.yml # Promtail log shipper config
├── alertmanager/
│ └── alertmanager.yml # Alert notification config
└── grafana/
├── provisioning/
│ ├── datasources/
│ │ └── datasources.yml # Auto-configure data sources
│ └── dashboards/
│ └── dashboards.yml # Dashboard provisioning
└── dashboards/
└── re-workflow-overview.json # Pre-built dashboard
🔧 Configuration
Prometheus Scrape Targets
Edit prometheus/prometheus.yml to add/modify scrape targets:
scrape_configs:
- job_name: 're-workflow-backend'
static_configs:
# For local development (backend outside Docker)
- targets: ['host.docker.internal:5000']
# For Docker deployment (backend in Docker)
# - targets: ['re_workflow_backend:5000']
Log Retention
Edit loki/loki-config.yml:
limits_config:
retention_period: 15d # Adjust retention period
Alert Notifications
Edit alertmanager/alertmanager.yml to configure:
- Email notifications
- Slack webhooks
- Custom webhook endpoints
🛠️ Common Commands
# Start services
docker-compose -f docker-compose.monitoring.yml up -d
# Stop services
docker-compose -f docker-compose.monitoring.yml down
# View logs
docker-compose -f docker-compose.monitoring.yml logs -f
# View specific service logs
docker-compose -f docker-compose.monitoring.yml logs -f grafana
# Restart a service
docker-compose -f docker-compose.monitoring.yml restart prometheus
# Check service health
docker ps
# Remove all data (fresh start)
docker-compose -f docker-compose.monitoring.yml down -v
⚡ Metrics Exposed by Backend
The backend exposes these metrics at /metrics:
HTTP Metrics
http_requests_total- Total HTTP requests (by method, route, status)http_request_duration_seconds- Request latency histogramhttp_request_errors_total- Error count (4xx, 5xx)http_active_connections- Current active connections
Business Metrics
tat_breaches_total- TAT breach eventspending_workflows_count- Pending workflow gaugeworkflow_operations_total- Workflow operation countauth_events_total- Authentication events
Node.js Runtime
nodejs_heap_size_*- Heap memory metricsnodejs_eventloop_lag_*- Event loop lagprocess_cpu_*- CPU usageprocess_resident_memory_bytes- RSS memory
🔒 Security Notes
- Change default passwords in production
- Enable TLS for external access
- Configure firewall to restrict access to monitoring ports
- Use reverse proxy (nginx) for HTTPS
🐛 Troubleshooting
Prometheus can't scrape backend
- Ensure backend is running on port 5000
- Check
/metricsendpoint:curl http://localhost:5000/metrics - For Docker: use
host.docker.internal:5000
Logs not appearing in Loki
- Check Promtail logs:
docker logs re_promtail - Verify log file path in
promtail-config.yml - Ensure backend has
LOKI_HOSTconfigured
Grafana dashboards empty
- Wait 30-60 seconds for data collection
- Check data source configuration in Grafana
- Verify time range selection
Docker memory issues
# Increase Docker Desktop memory allocation
# Settings → Resources → Memory → 4GB+
📞 Support
For issues with the monitoring stack:
- Check container logs:
docker logs <container_name> - Verify configuration files syntax
- Ensure Docker Desktop is running with sufficient resources