15 KiB
15 KiB
Loki + Grafana Deployment Guide for RE Workflow
Overview
This guide covers deploying Loki with Grafana for log aggregation in the RE Workflow application.
┌─────────────────────────┐ ┌─────────────────────────┐
│ RE Workflow Backend │──────────▶│ Loki │
│ (Node.js + Winston) │ HTTP │ (Log Storage) │
└─────────────────────────┘ :3100 └───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Grafana │
│ monitoring.cloudtopiaa │
│ (Your existing!) │
└─────────────────────────┘
Why Loki + Grafana?
- ✅ Lightweight - designed for logs (unlike ELK)
- ✅ Uses your existing Grafana instance
- ✅ Same query language as Prometheus (LogQL)
- ✅ Cost-effective - indexes labels, not content
Part 1: Windows Development Setup
Prerequisites (Windows)
- Docker Desktop for Windows installed
- WSL2 enabled (recommended)
- 4GB+ RAM available for Docker
Step 1: Install Docker Desktop
- Download from: https://www.docker.com/products/docker-desktop/
- Run installer
- Enable WSL2 integration when prompted
- Restart computer
Step 2: Create Project Directory
Open PowerShell as Administrator:
# Create directory
mkdir C:\loki
cd C:\loki
Step 3: Create Loki Configuration (Windows)
Create file C:\loki\loki-config.yaml:
# Using PowerShell
notepad C:\loki\loki-config.yaml
Paste this configuration:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
retention_period: 7d
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
Step 4: Create Docker Compose (Windows)
Create file C:\loki\docker-compose.yml:
notepad C:\loki\docker-compose.yml
Paste this configuration:
version: '3.8'
services:
loki:
image: grafana/loki:2.9.2
container_name: loki
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3001:3000" # Using 3001 since 3000 is used by React frontend
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin123
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- loki
restart: unless-stopped
volumes:
loki-data:
grafana-data:
Step 5: Start Services (Windows)
cd C:\loki
docker-compose up -d
Wait 30 seconds for services to initialize.
Step 6: Verify Services (Windows)
# Check containers are running
docker ps
# Test Loki health
Invoke-WebRequest -Uri http://localhost:3100/ready
# Or using curl (if installed)
curl http://localhost:3100/ready
Step 7: Configure Grafana (Windows Dev)
- Open browser:
http://localhost:3001(port 3001 to avoid conflict with React on 3000) - Login:
admin/admin123 - Go to: Connections → Data Sources → Add data source
- Select: Loki
- Configure:
- URL:
http://loki:3100
- URL:
- Click: Save & Test
Step 8: Configure Backend .env (Windows Dev)
# Development - Local Loki
LOKI_HOST=http://localhost:3100
Windows Commands Reference
| Command | Purpose |
|---|---|
docker-compose up -d |
Start Loki + Grafana |
docker-compose down |
Stop services |
docker-compose logs -f loki |
View Loki logs |
docker-compose restart |
Restart services |
docker ps |
Check running containers |
Part 2: Linux Production Setup (DevOps)
Prerequisites (Linux)
- Ubuntu 20.04+ / CentOS 7+ / RHEL 8+
- Docker & Docker Compose installed
- 2GB+ RAM (4GB recommended)
- 10GB+ disk space
- Grafana running at
http://monitoring.cloudtopiaa.com/
Step 1: Install Docker (if not installed)
Ubuntu/Debian:
# Update packages
sudo apt update
# Install Docker
sudo apt install -y docker.io docker-compose
# Start Docker
sudo systemctl start docker
sudo systemctl enable docker
# Add user to docker group
sudo usermod -aG docker $USER
CentOS/RHEL:
# Install Docker
sudo yum install -y docker docker-compose
# Start Docker
sudo systemctl start docker
sudo systemctl enable docker
Step 2: Create Loki Directory
sudo mkdir -p /opt/loki
cd /opt/loki
Step 3: Create Loki Configuration (Linux)
sudo nano /opt/loki/loki-config.yaml
Paste this configuration:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
limits_config:
retention_period: 30d
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
# Storage retention
compactor:
working_directory: /tmp/loki/compactor
retention_enabled: true
retention_delete_delay: 2h
delete_request_store: filesystem
Step 4: Create Docker Compose (Linux Production)
sudo nano /opt/loki/docker-compose.yml
Paste this configuration (Loki only - uses existing Grafana):
version: '3.8'
services:
loki:
image: grafana/loki:2.9.2
container_name: loki
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
- loki-data:/tmp/loki
command: -config.file=/etc/loki/local-config.yaml
networks:
- monitoring
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1"]
interval: 30s
timeout: 10s
retries: 5
networks:
monitoring:
driver: bridge
volumes:
loki-data:
driver: local
Step 5: Start Loki (Linux)
cd /opt/loki
sudo docker-compose up -d
Wait 30 seconds for Loki to initialize.
Step 6: Verify Loki (Linux)
# Check container
sudo docker ps | grep loki
# Test Loki health
curl http://localhost:3100/ready
# Test Loki is accepting logs
curl http://localhost:3100/loki/api/v1/labels
Expected response:
{"status":"success","data":[]}
Step 7: Open Firewall Port (Linux)
Ubuntu/Debian:
sudo ufw allow 3100/tcp
sudo ufw reload
CentOS/RHEL:
sudo firewall-cmd --permanent --add-port=3100/tcp
sudo firewall-cmd --reload
Step 8: Add Loki to Existing Grafana
- Open Grafana:
http://monitoring.cloudtopiaa.com/ - Login with admin credentials
- Go to: Connections → Data Sources → Add data source
- Select: Loki
- Configure:
| Field | Value |
|---|---|
| Name | RE-Workflow-Logs |
| URL | http://<loki-server-ip>:3100 |
| Timeout | 60 |
- Click: Save & Test
- Should see: ✅ "Data source successfully connected"
Step 9: Configure Backend .env (Production)
# Production - Remote Loki
LOKI_HOST=http://<loki-server-ip>:3100
# LOKI_USER= # Optional: if basic auth enabled
# LOKI_PASSWORD= # Optional: if basic auth enabled
Linux Commands Reference
| Command | Purpose |
|---|---|
sudo docker-compose up -d |
Start Loki |
sudo docker-compose down |
Stop Loki |
sudo docker-compose logs -f |
View logs |
sudo docker-compose restart |
Restart |
sudo docker ps |
Check containers |
Step 10: Enable Basic Auth (Optional - Production)
For added security, enable basic auth:
# Install apache2-utils for htpasswd
sudo apt install apache2-utils
# Create password file
sudo htpasswd -c /opt/loki/.htpasswd lokiuser
# Update docker-compose.yml to use nginx reverse proxy with auth
Part 3: Grafana Dashboard Setup
Create Dashboard
- Go to:
http://monitoring.cloudtopiaa.com/dashboards(orhttp://localhost:3001for dev) - Click: New → New Dashboard
- Add panels as described below
Panel 1: Error Count (Stat)
Query (LogQL):
count_over_time({app="re-workflow"} |= "error" [24h])
- Visualization: Stat
- Title: "Errors (24h)"
Panel 2: Error Timeline (Time Series)
Query (LogQL):
sum by (level) (count_over_time({app="re-workflow"} | json | level=~"error|warn" [5m]))
- Visualization: Time Series
- Title: "Errors Over Time"
Panel 3: Recent Errors (Logs)
Query (LogQL):
{app="re-workflow"} | json | level="error"
- Visualization: Logs
- Title: "Recent Errors"
Panel 4: TAT Breaches (Stat)
Query (LogQL):
count_over_time({app="re-workflow"} | json | tatEvent="breached" [24h])
- Visualization: Stat
- Title: "TAT Breaches"
- Color: Red
Panel 5: Workflow Events (Pie)
Query (LogQL):
sum by (workflowEvent) (count_over_time({app="re-workflow"} | json | workflowEvent!="" [24h]))
- Visualization: Pie Chart
- Title: "Workflow Events"
Panel 6: Auth Failures (Table)
Query (LogQL):
{app="re-workflow"} | json | authEvent="auth_failure"
- Visualization: Table
- Title: "Authentication Failures"
Useful LogQL Queries
| Purpose | Query |
|---|---|
| All errors | {app="re-workflow"} | json | level="error" |
| Specific request | {app="re-workflow"} | json | requestId="REQ-2024-001" |
| User activity | {app="re-workflow"} | json | userId="user-123" |
| TAT breaches | {app="re-workflow"} | json | tatEvent="breached" |
| Auth failures | {app="re-workflow"} | json | authEvent="auth_failure" |
| Workflow created | {app="re-workflow"} | json | workflowEvent="created" |
| API errors (5xx) | {app="re-workflow"} | json | statusCode>=500 |
| Slow requests | {app="re-workflow"} | json | duration>3000 |
| Error rate | sum(rate({app="re-workflow"} | json | level="error"[5m])) |
| By department | {app="re-workflow"} | json | department="Engineering" |
Part 4: Alerting Setup
Alert 1: High Error Rate
- Go to: Alerting → Alert Rules → New Alert Rule
- Configure:
- Name:
RE Workflow - High Error Rate - Data source:
RE-Workflow-Logs - Query:
count_over_time({app="re-workflow"} | json | level="error" [5m]) - Condition: IS ABOVE 10
- Name:
- Add notification (Slack, Email)
Alert 2: TAT Breach
- Create new alert rule
- Configure:
- Name:
RE Workflow - TAT Breach - Query:
count_over_time({app="re-workflow"} | json | tatEvent="breached" [15m]) - Condition: IS ABOVE 0
- Name:
- Add notification
Alert 3: Auth Attack Detection
- Create new alert rule
- Configure:
- Name:
RE Workflow - Auth Attack - Query:
count_over_time({app="re-workflow"} | json | authEvent="auth_failure" [5m]) - Condition: IS ABOVE 20
- Name:
- Add notification to Security team
Part 5: Troubleshooting
Windows Issues
Docker Desktop not starting
# Restart Docker Desktop service
Restart-Service docker
# Or restart Docker Desktop from system tray
Port 3100 already in use
# Find process using port
netstat -ano | findstr :3100
# Kill process
taskkill /PID <pid> /F
WSL2 issues
# Update WSL
wsl --update
# Restart WSL
wsl --shutdown
Linux Issues
Loki won't start
# Check logs
sudo docker logs loki
# Common fix - permissions
sudo chown -R 10001:10001 /opt/loki
Grafana can't connect to Loki
# Verify Loki is healthy
curl http://localhost:3100/ready
# Check network from Grafana server
curl http://loki-server:3100/ready
# Restart Loki
sudo docker-compose restart
Logs not appearing in Grafana
- Check application env has correct
LOKI_HOST - Verify network connectivity:
curl http://loki:3100/ready - Check labels:
curl http://localhost:3100/loki/api/v1/labels - Wait for application to send first logs
High memory usage
# Reduce retention period in loki-config.yaml
limits_config:
retention_period: 7d # Reduce from 30d
Quick Reference
Environment Comparison
| Setting | Windows Dev | Linux Production |
|---|---|---|
| LOKI_HOST | http://localhost:3100 |
http://<server-ip>:3100 |
| Grafana URL | http://localhost:3001 |
http://monitoring.cloudtopiaa.com |
| Config Path | C:\loki\ |
/opt/loki/ |
| Retention | 7 days | 30 days |
Port Reference
| Service | Port | URL |
|---|---|---|
| Loki | 3100 | http://server:3100 |
| Grafana (Dev) | 3001 | http://localhost:3001 |
| Grafana (Prod) | 80/443 | http://monitoring.cloudtopiaa.com/ |
| React Frontend | 3000 | http://localhost:3000 |
Verification Checklist
Windows Development
- Docker Desktop running
docker psshows loki and grafana containershttp://localhost:3100/readyreturns "ready"http://localhost:3001shows Grafana login- Loki data source connected in Grafana
- Backend
.envhasLOKI_HOST=http://localhost:3100
Linux Production
- Loki container running (
docker ps) curl localhost:3100/readyreturns "ready"- Firewall port 3100 open
- Grafana connected to Loki
- Backend
.envhas correctLOKI_HOST - Logs appearing in Grafana Explore
- Dashboard created
- Alerts configured
Contact
For issues with this setup:
- Backend logs: Check Grafana dashboard
- Infrastructure: Contact DevOps team