Re_Backend/docs/LOKI_DEPLOYMENT_GUIDE.md

15 KiB

Loki + Grafana Deployment Guide for RE Workflow

Overview

This guide covers deploying Loki with Grafana for log aggregation in the RE Workflow application.

┌─────────────────────────┐          ┌─────────────────────────┐
│   RE Workflow Backend   │──────────▶│         Loki            │
│   (Node.js + Winston)   │   HTTP    │    (Log Storage)        │
└─────────────────────────┘   :3100   └───────────┬─────────────┘
                                                  │
                                      ┌───────────▼─────────────┐
                                      │        Grafana          │
                                      │ monitoring.cloudtopiaa  │
                                      │   (Your existing!)      │
                                      └─────────────────────────┘

Why Loki + Grafana?

  • Lightweight - designed for logs (unlike ELK)
  • Uses your existing Grafana instance
  • Same query language as Prometheus (LogQL)
  • Cost-effective - indexes labels, not content

Part 1: Windows Development Setup

Prerequisites (Windows)

  • Docker Desktop for Windows installed
  • WSL2 enabled (recommended)
  • 4GB+ RAM available for Docker

Step 1: Install Docker Desktop

  1. Download from: https://www.docker.com/products/docker-desktop/
  2. Run installer
  3. Enable WSL2 integration when prompted
  4. Restart computer

Step 2: Create Project Directory

Open PowerShell as Administrator:

# Create directory
mkdir C:\loki
cd C:\loki

Step 3: Create Loki Configuration (Windows)

Create file C:\loki\loki-config.yaml:

# Using PowerShell
notepad C:\loki\loki-config.yaml

Paste this configuration:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 7d
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

Step 4: Create Docker Compose (Windows)

Create file C:\loki\docker-compose.yml:

notepad C:\loki\docker-compose.yml

Paste this configuration:

version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.2
    container_name: loki
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3001:3000"    # Using 3001 since 3000 is used by React frontend
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - loki
    restart: unless-stopped

volumes:
  loki-data:
  grafana-data:

Step 5: Start Services (Windows)

cd C:\loki
docker-compose up -d

Wait 30 seconds for services to initialize.


Step 6: Verify Services (Windows)

# Check containers are running
docker ps

# Test Loki health
Invoke-WebRequest -Uri http://localhost:3100/ready

# Or using curl (if installed)
curl http://localhost:3100/ready

Step 7: Configure Grafana (Windows Dev)

  1. Open browser: http://localhost:3001 (port 3001 to avoid conflict with React on 3000)
  2. Login: admin / admin123
  3. Go to: Connections → Data Sources → Add data source
  4. Select: Loki
  5. Configure:
    • URL: http://loki:3100
  6. Click: Save & Test

Step 8: Configure Backend .env (Windows Dev)

# Development - Local Loki
LOKI_HOST=http://localhost:3100

Windows Commands Reference

Command Purpose
docker-compose up -d Start Loki + Grafana
docker-compose down Stop services
docker-compose logs -f loki View Loki logs
docker-compose restart Restart services
docker ps Check running containers

Part 2: Linux Production Setup (DevOps)

Prerequisites (Linux)

  • Ubuntu 20.04+ / CentOS 7+ / RHEL 8+
  • Docker & Docker Compose installed
  • 2GB+ RAM (4GB recommended)
  • 10GB+ disk space
  • Grafana running at http://monitoring.cloudtopiaa.com/

Step 1: Install Docker (if not installed)

Ubuntu/Debian:

# Update packages
sudo apt update

# Install Docker
sudo apt install -y docker.io docker-compose

# Start Docker
sudo systemctl start docker
sudo systemctl enable docker

# Add user to docker group
sudo usermod -aG docker $USER

CentOS/RHEL:

# Install Docker
sudo yum install -y docker docker-compose

# Start Docker
sudo systemctl start docker
sudo systemctl enable docker

Step 2: Create Loki Directory

sudo mkdir -p /opt/loki
cd /opt/loki

Step 3: Create Loki Configuration (Linux)

sudo nano /opt/loki/loki-config.yaml

Paste this configuration:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

limits_config:
  retention_period: 30d
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

# Storage retention
compactor:
  working_directory: /tmp/loki/compactor
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: filesystem

Step 4: Create Docker Compose (Linux Production)

sudo nano /opt/loki/docker-compose.yml

Paste this configuration (Loki only - uses existing Grafana):

version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.2
    container_name: loki
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/tmp/loki
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - monitoring
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5

networks:
  monitoring:
    driver: bridge

volumes:
  loki-data:
    driver: local

Step 5: Start Loki (Linux)

cd /opt/loki
sudo docker-compose up -d

Wait 30 seconds for Loki to initialize.


Step 6: Verify Loki (Linux)

# Check container
sudo docker ps | grep loki

# Test Loki health
curl http://localhost:3100/ready

# Test Loki is accepting logs
curl http://localhost:3100/loki/api/v1/labels

Expected response:

{"status":"success","data":[]}

Step 7: Open Firewall Port (Linux)

Ubuntu/Debian:

sudo ufw allow 3100/tcp
sudo ufw reload

CentOS/RHEL:

sudo firewall-cmd --permanent --add-port=3100/tcp
sudo firewall-cmd --reload

Step 8: Add Loki to Existing Grafana

  1. Open Grafana: http://monitoring.cloudtopiaa.com/
  2. Login with admin credentials
  3. Go to: Connections → Data Sources → Add data source
  4. Select: Loki
  5. Configure:
Field Value
Name RE-Workflow-Logs
URL http://<loki-server-ip>:3100
Timeout 60
  1. Click: Save & Test
  2. Should see: "Data source successfully connected"

Step 9: Configure Backend .env (Production)

# Production - Remote Loki
LOKI_HOST=http://<loki-server-ip>:3100
# LOKI_USER=        # Optional: if basic auth enabled
# LOKI_PASSWORD=    # Optional: if basic auth enabled

Linux Commands Reference

Command Purpose
sudo docker-compose up -d Start Loki
sudo docker-compose down Stop Loki
sudo docker-compose logs -f View logs
sudo docker-compose restart Restart
sudo docker ps Check containers

Step 10: Enable Basic Auth (Optional - Production)

For added security, enable basic auth:

# Install apache2-utils for htpasswd
sudo apt install apache2-utils

# Create password file
sudo htpasswd -c /opt/loki/.htpasswd lokiuser

# Update docker-compose.yml to use nginx reverse proxy with auth

Part 3: Grafana Dashboard Setup

Create Dashboard

  1. Go to: http://monitoring.cloudtopiaa.com/dashboards (or http://localhost:3001 for dev)
  2. Click: New → New Dashboard
  3. Add panels as described below

Panel 1: Error Count (Stat)

Query (LogQL):

count_over_time({app="re-workflow"} |= "error" [24h])
  • Visualization: Stat
  • Title: "Errors (24h)"

Panel 2: Error Timeline (Time Series)

Query (LogQL):

sum by (level) (count_over_time({app="re-workflow"} | json | level=~"error|warn" [5m]))
  • Visualization: Time Series
  • Title: "Errors Over Time"

Panel 3: Recent Errors (Logs)

Query (LogQL):

{app="re-workflow"} | json | level="error"
  • Visualization: Logs
  • Title: "Recent Errors"

Panel 4: TAT Breaches (Stat)

Query (LogQL):

count_over_time({app="re-workflow"} | json | tatEvent="breached" [24h])
  • Visualization: Stat
  • Title: "TAT Breaches"
  • Color: Red

Panel 5: Workflow Events (Pie)

Query (LogQL):

sum by (workflowEvent) (count_over_time({app="re-workflow"} | json | workflowEvent!="" [24h]))
  • Visualization: Pie Chart
  • Title: "Workflow Events"

Panel 6: Auth Failures (Table)

Query (LogQL):

{app="re-workflow"} | json | authEvent="auth_failure"
  • Visualization: Table
  • Title: "Authentication Failures"

Useful LogQL Queries

Purpose Query
All errors {app="re-workflow"} | json | level="error"
Specific request {app="re-workflow"} | json | requestId="REQ-2024-001"
User activity {app="re-workflow"} | json | userId="user-123"
TAT breaches {app="re-workflow"} | json | tatEvent="breached"
Auth failures {app="re-workflow"} | json | authEvent="auth_failure"
Workflow created {app="re-workflow"} | json | workflowEvent="created"
API errors (5xx) {app="re-workflow"} | json | statusCode>=500
Slow requests {app="re-workflow"} | json | duration>3000
Error rate sum(rate({app="re-workflow"} | json | level="error"[5m]))
By department {app="re-workflow"} | json | department="Engineering"

Part 4: Alerting Setup

Alert 1: High Error Rate

  1. Go to: Alerting → Alert Rules → New Alert Rule
  2. Configure:
    • Name: RE Workflow - High Error Rate
    • Data source: RE-Workflow-Logs
    • Query: count_over_time({app="re-workflow"} | json | level="error" [5m])
    • Condition: IS ABOVE 10
  3. Add notification (Slack, Email)

Alert 2: TAT Breach

  1. Create new alert rule
  2. Configure:
    • Name: RE Workflow - TAT Breach
    • Query: count_over_time({app="re-workflow"} | json | tatEvent="breached" [15m])
    • Condition: IS ABOVE 0
  3. Add notification

Alert 3: Auth Attack Detection

  1. Create new alert rule
  2. Configure:
    • Name: RE Workflow - Auth Attack
    • Query: count_over_time({app="re-workflow"} | json | authEvent="auth_failure" [5m])
    • Condition: IS ABOVE 20
  3. Add notification to Security team

Part 5: Troubleshooting

Windows Issues

Docker Desktop not starting

# Restart Docker Desktop service
Restart-Service docker

# Or restart Docker Desktop from system tray

Port 3100 already in use

# Find process using port
netstat -ano | findstr :3100

# Kill process
taskkill /PID <pid> /F

WSL2 issues

# Update WSL
wsl --update

# Restart WSL
wsl --shutdown

Linux Issues

Loki won't start

# Check logs
sudo docker logs loki

# Common fix - permissions
sudo chown -R 10001:10001 /opt/loki

Grafana can't connect to Loki

# Verify Loki is healthy
curl http://localhost:3100/ready

# Check network from Grafana server
curl http://loki-server:3100/ready

# Restart Loki
sudo docker-compose restart

Logs not appearing in Grafana

  1. Check application env has correct LOKI_HOST
  2. Verify network connectivity: curl http://loki:3100/ready
  3. Check labels: curl http://localhost:3100/loki/api/v1/labels
  4. Wait for application to send first logs

High memory usage

# Reduce retention period in loki-config.yaml
limits_config:
  retention_period: 7d  # Reduce from 30d

Quick Reference

Environment Comparison

Setting Windows Dev Linux Production
LOKI_HOST http://localhost:3100 http://<server-ip>:3100
Grafana URL http://localhost:3001 http://monitoring.cloudtopiaa.com
Config Path C:\loki\ /opt/loki/
Retention 7 days 30 days

Port Reference

Service Port URL
Loki 3100 http://server:3100
Grafana (Dev) 3001 http://localhost:3001
Grafana (Prod) 80/443 http://monitoring.cloudtopiaa.com/
React Frontend 3000 http://localhost:3000

Verification Checklist

Windows Development

  • Docker Desktop running
  • docker ps shows loki and grafana containers
  • http://localhost:3100/ready returns "ready"
  • http://localhost:3001 shows Grafana login
  • Loki data source connected in Grafana
  • Backend .env has LOKI_HOST=http://localhost:3100

Linux Production

  • Loki container running (docker ps)
  • curl localhost:3100/ready returns "ready"
  • Firewall port 3100 open
  • Grafana connected to Loki
  • Backend .env has correct LOKI_HOST
  • Logs appearing in Grafana Explore
  • Dashboard created
  • Alerts configured

Contact

For issues with this setup:

  • Backend logs: Check Grafana dashboard
  • Infrastructure: Contact DevOps team