Re_Backend/monitoring
2025-12-02 21:15:33 +05:30
..
alertmanager docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
grafana docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
loki docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
prometheus docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
promtail docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
docker-compose.monitoring.yml docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
README.md docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
start-monitoring.bat docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30
stop-monitoring.bat docker setup done along with add spectotor and approver hndled from backend, dashboard created for metrics 2025-12-02 21:15:33 +05:30

RE Workflow Monitoring Stack

Complete monitoring solution with Grafana, Prometheus, Loki, and Promtail for the RE Workflow Management System.

🏗️ Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                         RE Workflow System                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐    │
│  │  Node.js API    │────│   PostgreSQL    │────│     Redis       │    │
│  │  (Port 5000)    │    │   (Port 5432)   │    │  (Port 6379)    │    │
│  └────────┬────────┘    └─────────────────┘    └─────────────────┘    │
│           │                                                             │
│           │ /metrics endpoint                                          │
│           │ Log files (./logs/)                                        │
│           ▼                                                             │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │                    Monitoring Stack                               │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │  │
│  │  │  Prometheus │──│    Loki     │──│       Promtail          │  │  │
│  │  │  (Port 9090)│  │ (Port 3100) │  │ (Collects log files)    │  │  │
│  │  └──────┬──────┘  └──────┬──────┘  └─────────────────────────┘  │  │
│  │         │                │                                        │  │
│  │         └────────┬───────┘                                        │  │
│  │                  ▼                                                 │  │
│  │         ┌─────────────────┐                                       │  │
│  │         │    Grafana      │                                       │  │
│  │         │  (Port 3001)    │◄── Pre-configured Dashboards          │  │
│  │         └─────────────────┘                                       │  │
│  └─────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

  • Docker Desktop installed and running
  • WSL2 enabled (recommended for Windows)
  • Backend API running on port 5000

Step 1: Start Monitoring Stack

# Navigate to monitoring folder
cd C:\Laxman\Royal_Enfield\Re_Backend\monitoring

# Start all monitoring services
docker-compose -f docker-compose.monitoring.yml up -d

# Check status
docker ps

Step 2: Configure Backend Environment

Add these to your backend .env file:

# Loki configuration (for direct log shipping from Winston)
LOKI_HOST=http://localhost:3100

# Optional: Basic auth if enabled
# LOKI_USER=your_username
# LOKI_PASSWORD=your_password

Step 3: Access Dashboards

Service URL Credentials
Grafana http://localhost:3001 admin / REWorkflow@2024
Prometheus http://localhost:9090 -
Loki http://localhost:3100 -
Alertmanager http://localhost:9093 -

📊 Available Dashboards

1. RE Workflow Overview

Pre-configured dashboard with:

  • API Metrics: Request rate, error rate, latency percentiles
  • Logs Overview: Error count, warnings, TAT breaches
  • Node.js Runtime: Memory usage, event loop lag, CPU

2. Custom LogQL Queries

Purpose Query
All errors {app="re-workflow"} | json | level="error"
TAT breaches {app="re-workflow"} | json | tatEvent="breached"
Auth failures {app="re-workflow"} | json | authEvent="auth_failure"
Slow requests (>3s) {app="re-workflow"} | json | duration>3000
By user {app="re-workflow"} | json | userId="USER-ID"
By request {app="re-workflow"} | json | requestId="REQ-XXX"

3. PromQL Queries (Prometheus)

Purpose Query
Request rate rate(http_requests_total{job="re-workflow-backend"}[5m])
Error rate rate(http_request_errors_total[5m]) / rate(http_requests_total[5m])
P95 latency histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Memory usage process_resident_memory_bytes{job="re-workflow-backend"}
Event loop lag nodejs_eventloop_lag_seconds{job="re-workflow-backend"}

📁 File Structure

monitoring/
├── docker-compose.monitoring.yml    # Main compose file
├── prometheus/
│   ├── prometheus.yml               # Prometheus configuration
│   └── alert.rules.yml              # Alert rules
├── loki/
│   └── loki-config.yml              # Loki configuration
├── promtail/
│   └── promtail-config.yml          # Promtail log shipper config
├── alertmanager/
│   └── alertmanager.yml             # Alert notification config
└── grafana/
    ├── provisioning/
    │   ├── datasources/
    │   │   └── datasources.yml      # Auto-configure data sources
    │   └── dashboards/
    │       └── dashboards.yml       # Dashboard provisioning
    └── dashboards/
        └── re-workflow-overview.json # Pre-built dashboard

🔧 Configuration

Prometheus Scrape Targets

Edit prometheus/prometheus.yml to add/modify scrape targets:

scrape_configs:
  - job_name: 're-workflow-backend'
    static_configs:
      # For local development (backend outside Docker)
      - targets: ['host.docker.internal:5000']
      # For Docker deployment (backend in Docker)
      # - targets: ['re_workflow_backend:5000']

Log Retention

Edit loki/loki-config.yml:

limits_config:
  retention_period: 15d  # Adjust retention period

Alert Notifications

Edit alertmanager/alertmanager.yml to configure:

  • Email notifications
  • Slack webhooks
  • Custom webhook endpoints

🛠️ Common Commands

# Start services
docker-compose -f docker-compose.monitoring.yml up -d

# Stop services
docker-compose -f docker-compose.monitoring.yml down

# View logs
docker-compose -f docker-compose.monitoring.yml logs -f

# View specific service logs
docker-compose -f docker-compose.monitoring.yml logs -f grafana

# Restart a service
docker-compose -f docker-compose.monitoring.yml restart prometheus

# Check service health
docker ps

# Remove all data (fresh start)
docker-compose -f docker-compose.monitoring.yml down -v

Metrics Exposed by Backend

The backend exposes these metrics at /metrics:

HTTP Metrics

  • http_requests_total - Total HTTP requests (by method, route, status)
  • http_request_duration_seconds - Request latency histogram
  • http_request_errors_total - Error count (4xx, 5xx)
  • http_active_connections - Current active connections

Business Metrics

  • tat_breaches_total - TAT breach events
  • pending_workflows_count - Pending workflow gauge
  • workflow_operations_total - Workflow operation count
  • auth_events_total - Authentication events

Node.js Runtime

  • nodejs_heap_size_* - Heap memory metrics
  • nodejs_eventloop_lag_* - Event loop lag
  • process_cpu_* - CPU usage
  • process_resident_memory_bytes - RSS memory

🔒 Security Notes

  1. Change default passwords in production
  2. Enable TLS for external access
  3. Configure firewall to restrict access to monitoring ports
  4. Use reverse proxy (nginx) for HTTPS

🐛 Troubleshooting

Prometheus can't scrape backend

  1. Ensure backend is running on port 5000
  2. Check /metrics endpoint: curl http://localhost:5000/metrics
  3. For Docker: use host.docker.internal:5000

Logs not appearing in Loki

  1. Check Promtail logs: docker logs re_promtail
  2. Verify log file path in promtail-config.yml
  3. Ensure backend has LOKI_HOST configured

Grafana dashboards empty

  1. Wait 30-60 seconds for data collection
  2. Check data source configuration in Grafana
  3. Verify time range selection

Docker memory issues

# Increase Docker Desktop memory allocation
# Settings → Resources → Memory → 4GB+

📞 Support

For issues with the monitoring stack:

  1. Check container logs: docker logs <container_name>
  2. Verify configuration files syntax
  3. Ensure Docker Desktop is running with sufficient resources