Prometheus and Grafana Monitoring Setup

This guide will walk you through setting up a comprehensive monitoring solution for your home server using Prometheus, Grafana, and Alert Manager. This stack will allow you to:

Collect metrics from your server and services
Visualize performance data with customizable dashboards
Set up alerts for critical events
Monitor system health and resource usage

Directory Structure

First, ensure you have the proper directory structure:

mkdir -p ~/docker/monitoring/prometheus/config
mkdir -p ~/docker/monitoring/grafana
mkdir -p ~/docker/monitoring/alertmanager/config
mkdir -p ~/docker/monitoring/node-exporter

Prometheus Setup

Prometheus is an open-source systems monitoring and alerting toolkit. It collects and stores metrics as time series data.

Prometheus Configuration

Create the Prometheus configuration file:

nano ~/docker/monitoring/prometheus/config/prometheus.yml

Add the following content:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

rule_files:
  - "/etc/prometheus/rules/*.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: "cadvisor"
    static_configs:
      - targets: ["cadvisor:8080"]

  - job_name: "nginx-proxy-manager"
    static_configs:
      - targets: ["nginx-proxy-manager:81"]
    metrics_path: /metrics
    
  - job_name: "loki"
    static_configs:
      - targets: ["loki:3100"]

Create a directory for alert rules:

mkdir -p ~/docker/monitoring/prometheus/config/rules

Create a basic alert rules file:

nano ~/docker/monitoring/prometheus/config/rules/alerts.yml

Add the following content:

groups:
  - name: basic_alerts
    rules:
      - alert: HighCPULoad
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU load (instance {{ $labels.instance }})"
          description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HighMemoryLoad
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory load (instance {{ $labels.instance }})"
          description: "Memory load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

      - alert: HighDiskUsage
        expr: (node_filesystem_size_bytes{fstype!="tmpfs"} - node_filesystem_free_bytes{fstype!="tmpfs"}) / node_filesystem_size_bytes{fstype!="tmpfs"} * 100 > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High disk usage (instance {{ $labels.instance }})"
          description: "Disk usage is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Alert Manager Setup

Alert Manager handles alerts sent by Prometheus and routes them to the appropriate receiver.

Create the Alert Manager configuration file:

nano ~/docker/monitoring/alertmanager/config/alertmanager.yml

Add the following content (customize with your email or Slack details):

global:
  resolve_timeout: 5m
  # For email alerts
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'your-email@gmail.com'
  smtp_auth_username: 'your-email@gmail.com'
  smtp_auth_password: 'your-app-password'  # Use app password for Gmail
  smtp_require_tls: true

# Route all alerts to all receivers
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-notifications'
  routes:
    - match:
        severity: critical
      receiver: 'slack-notifications'
      continue: true

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'your-email@gmail.com'
        send_resolved: true
  
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
        channel: '#alerts'
        send_resolved: true
        title: "{{ .GroupLabels.alertname }}"
        text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname']

Node Exporter Setup

Node Exporter collects hardware and OS metrics from the host system.

Docker Compose Configuration

Create a docker-compose.yml file for the monitoring stack:

nano ~/docker/monitoring/docker-compose.yml

Add the following content:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/config:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    networks:
      - proxy
    profiles:
      - monitoring

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    volumes:
      - ./alertmanager/config:/etc/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    ports:
      - "9093:9093"
    networks:
      - proxy
    profiles:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - proxy
    profiles:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    networks:
      - proxy
    profiles:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=secure_password  # Change this!
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - "3000:3000"
    networks:
      - proxy
    profiles:
      - monitoring

networks:
  proxy:
    external: true

volumes:
  prometheus_data:
  grafana_data:

Starting the Monitoring Stack

Start the monitoring stack with Docker Compose:

cd ~/docker/monitoring
docker-compose --profile monitoring up -d

Configuring Nginx Proxy Manager for Monitoring Services

Now, let's set up proxy hosts in Nginx Proxy Manager for each monitoring service:

Prometheus

Go to "Hosts" > "Proxy Hosts" and click "Add Proxy Host"
Configure the following:
- Domain Name: prometheus.yourdomain.com
- Scheme: http
- Forward Hostname / IP: prometheus
- Forward Port: 9090
Under the "SSL" tab:
- Select your SSL certificate
- Force SSL: Enabled
Under the "Advanced" tab, add the Authelia configuration (from the previous guide)
Click "Save"

Grafana

Go to "Hosts" > "Proxy Hosts" and click "Add Proxy Host"
Configure the following:
- Domain Name: grafana.yourdomain.com
- Scheme: http
- Forward Hostname / IP: grafana
- Forward Port: 3000
Under the "SSL" tab:
- Select your SSL certificate
- Force SSL: Enabled
Under the "Advanced" tab, add the Authelia configuration
Click "Save"

Alert Manager

Go to "Hosts" > "Proxy Hosts" and click "Add Proxy Host"
Configure the following:
- Domain Name: alerts.yourdomain.com
- Scheme: http
- Forward Hostname / IP: alertmanager
- Forward Port: 9093
Under the "SSL" tab:
- Select your SSL certificate
- Force SSL: Enabled
Under the "Advanced" tab, add the Authelia configuration
Click "Save"

Configuring Grafana

Access Grafana at https://grafana.yourdomain.com
Log in with the default credentials:
- Username: admin
- Password: secure_password (the one you set in docker-compose.yml)
You'll be prompted to change the default password

Adding Prometheus as a Data Source

Go to "Configuration" > "Data Sources"
Click "Add data source"
Select "Prometheus"
Set the URL to http://prometheus:9090
Click "Save & Test"

Importing Dashboards

Let's import some useful dashboards:

Go to "Create" > "Import"
Enter one of these dashboard IDs:
- 1860 (Node Exporter Full)
- 893 (Docker and System Monitoring)
- 10619 (Docker Monitoring)
Click "Load"
Select "Prometheus" as the data source
Click "Import"

Repeat for each dashboard ID.

Creating Custom Alerts

You can create custom alerts in Grafana:

Go to "Alerting" in the left sidebar
Click "New alert rule"
Configure your alert conditions
Set notification channels (email, Slack, etc.)
Save the alert

Testing Alerts

To test if your alerts are working:

For CPU alerts: Run a stress test

sudo apt install stress
stress --cpu 8 --timeout 300

For disk space alerts: Create a large file
```
fallocate -l 10G /tmp/large_file
```
Check Alert Manager at https://alerts.yourdomain.com to see if alerts are triggered

Troubleshooting

Prometheus Issues

Check logs: docker logs prometheus
Verify prometheus.yml syntax
Check if targets are up in the Prometheus UI (Status > Targets)

Grafana Issues

Check logs: docker logs grafana
Verify data source connection
Check permissions on grafana_data volume

Alert Manager Issues

Check logs: docker logs alertmanager
Verify alertmanager.yml syntax
Test email or Slack notifications manually

Next Steps

Now that you have your monitoring stack set up, you can proceed to the next section to configure Loki and Promtail for centralized logging.