# Loki + Grafana Deployment Guide for RE Workflow ## Overview This guide covers deploying **Loki with Grafana** for log aggregation in the RE Workflow application. ``` ┌─────────────────────────┐ ┌─────────────────────────┐ │ RE Workflow Backend │──────────▶│ Loki │ │ (Node.js + Winston) │ HTTP │ (Log Storage) │ └─────────────────────────┘ :3100 └───────────┬─────────────┘ │ ┌───────────▼─────────────┐ │ Grafana │ │ monitoring.cloudtopiaa │ │ (Your existing!) │ └─────────────────────────┘ ``` **Why Loki + Grafana?** - ✅ Lightweight - designed for logs (unlike ELK) - ✅ Uses your existing Grafana instance - ✅ Same query language as Prometheus (LogQL) - ✅ Cost-effective - indexes labels, not content --- # Part 1: Windows Development Setup ## Prerequisites (Windows) - Docker Desktop for Windows installed - WSL2 enabled (recommended) - 4GB+ RAM available for Docker --- ## Step 1: Install Docker Desktop 1. Download from: https://www.docker.com/products/docker-desktop/ 2. Run installer 3. Enable WSL2 integration when prompted 4. Restart computer --- ## Step 2: Create Project Directory Open PowerShell as Administrator: ```powershell # Create directory mkdir C:\loki cd C:\loki ``` --- ## Step 3: Create Loki Configuration (Windows) Create file `C:\loki\loki-config.yaml`: ```powershell # Using PowerShell notepad C:\loki\loki-config.yaml ``` **Paste this configuration:** ```yaml auth_enabled: false server: http_listen_port: 3100 grpc_listen_port: 9096 common: instance_addr: 127.0.0.1 path_prefix: /loki storage: filesystem: chunks_directory: /loki/chunks rules_directory: /loki/rules replication_factor: 1 ring: kvstore: store: inmemory query_range: results_cache: cache: embedded_cache: enabled: true max_size_mb: 100 schema_config: configs: - from: 2020-10-24 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h limits_config: retention_period: 7d ingestion_rate_mb: 10 ingestion_burst_size_mb: 20 ``` --- ## Step 4: Create Docker Compose (Windows) Create file `C:\loki\docker-compose.yml`: ```powershell notepad C:\loki\docker-compose.yml ``` **Paste this configuration:** ```yaml version: '3.8' services: loki: image: grafana/loki:2.9.2 container_name: loki ports: - "3100:3100" volumes: - ./loki-config.yaml:/etc/loki/local-config.yaml - loki-data:/loki command: -config.file=/etc/loki/local-config.yaml restart: unless-stopped grafana: image: grafana/grafana:latest container_name: grafana ports: - "3001:3000" # Using 3001 since 3000 is used by React frontend environment: - GF_SECURITY_ADMIN_USER=admin - GF_SECURITY_ADMIN_PASSWORD=admin123 volumes: - grafana-data:/var/lib/grafana depends_on: - loki restart: unless-stopped volumes: loki-data: grafana-data: ``` --- ## Step 5: Start Services (Windows) ```powershell cd C:\loki docker-compose up -d ``` **Wait 30 seconds for services to initialize.** --- ## Step 6: Verify Services (Windows) ```powershell # Check containers are running docker ps # Test Loki health Invoke-WebRequest -Uri http://localhost:3100/ready # Or using curl (if installed) curl http://localhost:3100/ready ``` --- ## Step 7: Configure Grafana (Windows Dev) 1. Open browser: `http://localhost:3001` *(port 3001 to avoid conflict with React on 3000)* 2. Login: `admin` / `admin123` 3. Go to: **Connections → Data Sources → Add data source** 4. Select: **Loki** 5. Configure: - URL: `http://loki:3100` 6. Click: **Save & Test** --- ## Step 8: Configure Backend .env (Windows Dev) ```env # Development - Local Loki LOKI_HOST=http://localhost:3100 ``` --- ## Windows Commands Reference | Command | Purpose | |---------|---------| | `docker-compose up -d` | Start Loki + Grafana | | `docker-compose down` | Stop services | | `docker-compose logs -f loki` | View Loki logs | | `docker-compose restart` | Restart services | | `docker ps` | Check running containers | --- # Part 2: Linux Production Setup (DevOps) ## Prerequisites (Linux) - Ubuntu 20.04+ / CentOS 7+ / RHEL 8+ - Docker & Docker Compose installed - 2GB+ RAM (4GB recommended) - 10GB+ disk space - Grafana running at `http://monitoring.cloudtopiaa.com/` --- ## Step 1: Install Docker (if not installed) **Ubuntu/Debian:** ```bash # Update packages sudo apt update # Install Docker sudo apt install -y docker.io docker-compose # Start Docker sudo systemctl start docker sudo systemctl enable docker # Add user to docker group sudo usermod -aG docker $USER ``` **CentOS/RHEL:** ```bash # Install Docker sudo yum install -y docker docker-compose # Start Docker sudo systemctl start docker sudo systemctl enable docker ``` --- ## Step 2: Create Loki Directory ```bash sudo mkdir -p /opt/loki cd /opt/loki ``` --- ## Step 3: Create Loki Configuration (Linux) ```bash sudo nano /opt/loki/loki-config.yaml ``` **Paste this configuration:** ```yaml auth_enabled: false server: http_listen_port: 3100 grpc_listen_port: 9096 common: instance_addr: 127.0.0.1 path_prefix: /tmp/loki storage: filesystem: chunks_directory: /tmp/loki/chunks rules_directory: /tmp/loki/rules replication_factor: 1 ring: kvstore: store: inmemory query_range: results_cache: cache: embedded_cache: enabled: true max_size_mb: 100 schema_config: configs: - from: 2020-10-24 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h ruler: alertmanager_url: http://localhost:9093 limits_config: retention_period: 30d ingestion_rate_mb: 10 ingestion_burst_size_mb: 20 # Storage retention compactor: working_directory: /tmp/loki/compactor retention_enabled: true retention_delete_delay: 2h delete_request_store: filesystem ``` --- ## Step 4: Create Docker Compose (Linux Production) ```bash sudo nano /opt/loki/docker-compose.yml ``` **Paste this configuration (Loki only - uses existing Grafana):** ```yaml version: '3.8' services: loki: image: grafana/loki:2.9.2 container_name: loki ports: - "3100:3100" volumes: - ./loki-config.yaml:/etc/loki/local-config.yaml - loki-data:/tmp/loki command: -config.file=/etc/loki/local-config.yaml networks: - monitoring restart: unless-stopped healthcheck: test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1"] interval: 30s timeout: 10s retries: 5 networks: monitoring: driver: bridge volumes: loki-data: driver: local ``` --- ## Step 5: Start Loki (Linux) ```bash cd /opt/loki sudo docker-compose up -d ``` **Wait 30 seconds for Loki to initialize.** --- ## Step 6: Verify Loki (Linux) ```bash # Check container sudo docker ps | grep loki # Test Loki health curl http://localhost:3100/ready # Test Loki is accepting logs curl http://localhost:3100/loki/api/v1/labels ``` **Expected response:** ```json {"status":"success","data":[]} ``` --- ## Step 7: Open Firewall Port (Linux) **Ubuntu/Debian:** ```bash sudo ufw allow 3100/tcp sudo ufw reload ``` **CentOS/RHEL:** ```bash sudo firewall-cmd --permanent --add-port=3100/tcp sudo firewall-cmd --reload ``` --- ## Step 8: Add Loki to Existing Grafana 1. **Open Grafana:** `http://monitoring.cloudtopiaa.com/` 2. **Login** with admin credentials 3. **Go to:** Connections → Data Sources → Add data source 4. **Select:** Loki 5. **Configure:** | Field | Value | |-------|-------| | Name | `RE-Workflow-Logs` | | URL | `http://:3100` | | Timeout | `60` | 6. **Click:** Save & Test 7. **Should see:** ✅ "Data source successfully connected" --- ## Step 9: Configure Backend .env (Production) ```env # Production - Remote Loki LOKI_HOST=http://:3100 # LOKI_USER= # Optional: if basic auth enabled # LOKI_PASSWORD= # Optional: if basic auth enabled ``` --- ## Linux Commands Reference | Command | Purpose | |---------|---------| | `sudo docker-compose up -d` | Start Loki | | `sudo docker-compose down` | Stop Loki | | `sudo docker-compose logs -f` | View logs | | `sudo docker-compose restart` | Restart | | `sudo docker ps` | Check containers | --- ## Step 10: Enable Basic Auth (Optional - Production) For added security, enable basic auth: ```bash # Install apache2-utils for htpasswd sudo apt install apache2-utils # Create password file sudo htpasswd -c /opt/loki/.htpasswd lokiuser # Update docker-compose.yml to use nginx reverse proxy with auth ``` --- # Part 3: Grafana Dashboard Setup ## Create Dashboard 1. Go to: `http://monitoring.cloudtopiaa.com/dashboards` (or `http://localhost:3001` for dev) 2. Click: **New → New Dashboard** 3. Add panels as described below --- ### Panel 1: Error Count (Stat) **Query (LogQL):** ``` count_over_time({app="re-workflow"} |= "error" [24h]) ``` - Visualization: **Stat** - Title: "Errors (24h)" --- ### Panel 2: Error Timeline (Time Series) **Query (LogQL):** ``` sum by (level) (count_over_time({app="re-workflow"} | json | level=~"error|warn" [5m])) ``` - Visualization: **Time Series** - Title: "Errors Over Time" --- ### Panel 3: Recent Errors (Logs) **Query (LogQL):** ``` {app="re-workflow"} | json | level="error" ``` - Visualization: **Logs** - Title: "Recent Errors" --- ### Panel 4: TAT Breaches (Stat) **Query (LogQL):** ``` count_over_time({app="re-workflow"} | json | tatEvent="breached" [24h]) ``` - Visualization: **Stat** - Title: "TAT Breaches" - Color: Red --- ### Panel 5: Workflow Events (Pie) **Query (LogQL):** ``` sum by (workflowEvent) (count_over_time({app="re-workflow"} | json | workflowEvent!="" [24h])) ``` - Visualization: **Pie Chart** - Title: "Workflow Events" --- ### Panel 6: Auth Failures (Table) **Query (LogQL):** ``` {app="re-workflow"} | json | authEvent="auth_failure" ``` - Visualization: **Table** - Title: "Authentication Failures" --- ## Useful LogQL Queries | Purpose | Query | |---------|-------| | All errors | `{app="re-workflow"} \| json \| level="error"` | | Specific request | `{app="re-workflow"} \| json \| requestId="REQ-2024-001"` | | User activity | `{app="re-workflow"} \| json \| userId="user-123"` | | TAT breaches | `{app="re-workflow"} \| json \| tatEvent="breached"` | | Auth failures | `{app="re-workflow"} \| json \| authEvent="auth_failure"` | | Workflow created | `{app="re-workflow"} \| json \| workflowEvent="created"` | | API errors (5xx) | `{app="re-workflow"} \| json \| statusCode>=500` | | Slow requests | `{app="re-workflow"} \| json \| duration>3000` | | Error rate | `sum(rate({app="re-workflow"} \| json \| level="error"[5m]))` | | By department | `{app="re-workflow"} \| json \| department="Engineering"` | --- # Part 4: Alerting Setup ## Alert 1: High Error Rate 1. Go to: **Alerting → Alert Rules → New Alert Rule** 2. Configure: - Name: `RE Workflow - High Error Rate` - Data source: `RE-Workflow-Logs` - Query: `count_over_time({app="re-workflow"} | json | level="error" [5m])` - Condition: IS ABOVE 10 3. Add notification (Slack, Email) ## Alert 2: TAT Breach 1. Create new alert rule 2. Configure: - Name: `RE Workflow - TAT Breach` - Query: `count_over_time({app="re-workflow"} | json | tatEvent="breached" [15m])` - Condition: IS ABOVE 0 3. Add notification ## Alert 3: Auth Attack Detection 1. Create new alert rule 2. Configure: - Name: `RE Workflow - Auth Attack` - Query: `count_over_time({app="re-workflow"} | json | authEvent="auth_failure" [5m])` - Condition: IS ABOVE 20 3. Add notification to Security team --- # Part 5: Troubleshooting ## Windows Issues ### Docker Desktop not starting ```powershell # Restart Docker Desktop service Restart-Service docker # Or restart Docker Desktop from system tray ``` ### Port 3100 already in use ```powershell # Find process using port netstat -ano | findstr :3100 # Kill process taskkill /PID /F ``` ### WSL2 issues ```powershell # Update WSL wsl --update # Restart WSL wsl --shutdown ``` --- ## Linux Issues ### Loki won't start ```bash # Check logs sudo docker logs loki # Common fix - permissions sudo chown -R 10001:10001 /opt/loki ``` ### Grafana can't connect to Loki ```bash # Verify Loki is healthy curl http://localhost:3100/ready # Check network from Grafana server curl http://loki-server:3100/ready # Restart Loki sudo docker-compose restart ``` ### Logs not appearing in Grafana 1. Check application env has correct `LOKI_HOST` 2. Verify network connectivity: `curl http://loki:3100/ready` 3. Check labels: `curl http://localhost:3100/loki/api/v1/labels` 4. Wait for application to send first logs ### High memory usage ```bash # Reduce retention period in loki-config.yaml limits_config: retention_period: 7d # Reduce from 30d ``` --- # Quick Reference ## Environment Comparison | Setting | Windows Dev | Linux Production | |---------|-------------|------------------| | LOKI_HOST | `http://localhost:3100` | `http://:3100` | | Grafana URL | `http://localhost:3001` | `http://monitoring.cloudtopiaa.com` | | Config Path | `C:\loki\` | `/opt/loki/` | | Retention | 7 days | 30 days | ## Port Reference | Service | Port | URL | |---------|------|-----| | Loki | 3100 | `http://server:3100` | | Grafana (Dev) | 3001 | `http://localhost:3001` | | Grafana (Prod) | 80/443 | `http://monitoring.cloudtopiaa.com/` | | React Frontend | 3000 | `http://localhost:3000` | --- # Verification Checklist ## Windows Development - [ ] Docker Desktop running - [ ] `docker ps` shows loki and grafana containers - [ ] `http://localhost:3100/ready` returns "ready" - [ ] `http://localhost:3001` shows Grafana login - [ ] Loki data source connected in Grafana - [ ] Backend `.env` has `LOKI_HOST=http://localhost:3100` ## Linux Production - [ ] Loki container running (`docker ps`) - [ ] `curl localhost:3100/ready` returns "ready" - [ ] Firewall port 3100 open - [ ] Grafana connected to Loki - [ ] Backend `.env` has correct `LOKI_HOST` - [ ] Logs appearing in Grafana Explore - [ ] Dashboard created - [ ] Alerts configured --- # Contact For issues with this setup: - Backend logs: Check Grafana dashboard - Infrastructure: Contact DevOps team