Hybrid Infrastructure Uptime Monitoring With Uptime Kuma, Traefik, WireGuard, and Docker

Building health check and monitoring into your DevOps infrastructure is essential for ensuring the reliability and availability of services. By continuously monitoring the performance, status, and uptime of services, you are able to quickly identify and resolve issues before they impact your end-users and avoid silent failures that can quickly engender a snowball effect later on.

In this short article, we will leverage Uptime Kuma, traefik, WireGuard and Docker to effortlessly monitor a hydrid, multi-site or multi-cloud infrastructure.

screenshot

Assumption & Goals

  • You have a hybrid infrastructure or a multi-cloud setup
  • You’d like to monitor services and/or host uptime to promptly catch failures as they happen
  • Monitor both LAN and WAN resources
  • You already have a WireGuard server setup on-prem
  • The monitoring setup should be secure and cost-effective

Approach

  1. Cloud VPS or independent host: monitor your on-premises infrastructure from the cloud or form another site. This external monitoring point will allow for more accurate insight into the availability of your services from different locations and provide a failsafe for detecting outages that may affect the wider network.
  2. WireGuard VPN:Configure a secure access over WAN to your on-premises LAN to monitor resources sitting behind a firewall. By utilizing a secure VPN like WireGuard, you can establish a protected connection between your monitoring system and the resources you want to monitor, ensuring that your monitoring traffic remains encrypted and private.
  3. Traffic routing: WAN resources should still be monitored from publicly accessible WAN IP addresses. This will help you understand the availability and reliability of your external-facing services from various geographic locations and enable faster detection of any issues affecting your users’ experience.
  4. SSL reverse-proxy to securely access the monitoring webUI.
  5. Access controls, ACLs, and VLAN traversal: must be enforced by the on-premises firewall. This is out of the scope of this article. Apply least-privilege principles.
    • Restrict access to designated services (ICMP) and ports only.
    • Treat the monitoring host as a bastion and severely harden it.
    • Ideally, use a host that’s not public facing.

Components

Softwares of choice:

  • Uptime Kuma on Docker (VPS)
  • Traefik reverse-proxy on Docker (VPS)
  • WireGuard VPN on Docker (VPS)
  • WireGuard running on your on-premises firewall (I use OPNsense)

The secret sauce:

  • network_mode: service:wireguard to route uptime-kuma traffic through our WireGuard container
  • AllowIPs configured for LAN access only; ensure only traffic destined to private addresses goes through our WireGuard tunnel, and take out the need for complicated iptables

This setup is very lightweight and I’ve deployed it successfully to monitor 100+ ressources using a simple free-tier 1Gb Oracle VPS.

In Practice

Network Topology

For the sake of the example configurations below, we will assume the following two subnets:

  • LAN to monitor: 192.168.50.0/24
  • VPN VLAN: 192.168.15.0/24

Example diagram

Network Diagram

WireGuard Configuration

We assume the below WireGuard site-to-site configuration, under ./wg-config/wg0.conf.

[Interface]
PrivateKey = 123ABC
Address = 192.168.15.2/32
DNS = 192.168.15.1

[Peer]
PublicKey = 456DEF
AllowedIPs = 192.168.50.0/24, 192.168.15.0/24
Endpoint = example.com:51820

The AllowedIPs configuration defines the subnets that will route through WireGuard.
You could provide a fine comma-separated list as in the example above, or a wider subnet such as 192.168.0.0/16. Using 0.0.0.0/0 will route your entire uptime-kuma traffic through WireGuard instead.

Docker Compose

As mentioned before, the key here is to define our uptime-kuma networking property as network_mode: service:wireguard so that traffic will be routed through that container, which will then decide if it should go out from the tunnel or the WAN interface.

Our Docker Compose file includes the following services:

  • Uptime Kuma
  • Traefik (reverse proxy with SSL)
  • WireGuard

This is also an opinionated config where we limit resources available to our containers.

version: "3.8"
services:
  wireguard:
    image: lscr.io/linuxserver/wireguard:latest
    container_name: wireguard
    cap_add:
      - NET_ADMIN
      - SYS_MODULE #optional
    environment:
      - PUID=1001
      - PGID=1001
      - TZ=Asia/Singapore
    volumes:
      - ./wg-config:/config
      - /lib/modules:/lib/modules #optional
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256m
  traefik:
    image: "traefik:v2.10"
    container_name: "traefik"
    command:
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.myresolver.acme.tlschallenge=true"
      - "[email protected]"
      - "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
    ports:
      - "443:443"
      - "8080:8080"
    volumes:
      - "./letsencrypt:/letsencrypt"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256m
  uptime-kuma:
    image: louislam/uptime-kuma:latest
    container_name: uptime-kuma
    volumes:
      - ./uptime-kuma/data:/app/data
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256m
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.uptime-kuma.rule=Host(`monitoring.example.com`)"
      - "traefik.http.routers.uptime-kuma.entrypoints=websecure"
      - "traefik.http.routers.uptime-kuma.tls.certresolver=myresolver"
      - "traefik.http.services.uptime-kuma.loadbalancer.server.port=3001"
    network_mode: service:wireguard
    depends_on:
      wireguard:
        condition: service_started
      traefik:
        condition: service_started

Check Routing & Troubleshooting

We can check if the LAN is reachable from the uptime-kuma container by pinging our WireGuard gateway. If you can reach the gateway but encouter issues with reaching other services, something’s up with your firewall rules.

> $ docker exec -it uptime-kuma ping -c 4 192.168.50.1
PING 192.168.50.1 (192.168.50.1) 56(84) bytes of data.
64 bytes from 192.168.50.1: icmp_seq=1 ttl=64 time=3.63 ms
64 bytes from 192.168.50.1: icmp_seq=2 ttl=64 time=2.50 ms
64 bytes from 192.168.50.1: icmp_seq=3 ttl=64 time=2.49 ms
64 bytes from 192.168.50.1: icmp_seq=4 ttl=64 time=2.49 ms

--- 192.168.50.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 2.491/2.779/3.632/0.496 ms

Public websites are equally reachable from the uptime-kuma container:

> $ docker exec -it uptime-kuma ping -c 4 icanhazip.com
PING icanhazip.com (104.18.114.97) 56(84) bytes of data.
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=1 ttl=60 time=1.15 ms
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=2 ttl=60 time=1.26 ms
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=3 ttl=60 time=1.28 ms
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=4 ttl=60 time=1.23 ms

We can also confirm that public-facing services are monitored through the WAN interface. The IP address that comes back should belong to your WAN interface, and NOT to your on-premises gateway.

> $ docker exec -it uptime-kuma curl ipv4.icanhazip.com
158.178.1.1

Security Improvements

Mitigate possible vulnerabilities in uptime-kuma or other programs running on the host:

  • Harden the VPS host
  • Restric VLAN traversal from the VPN VLAN to ICMP only
  • Restrict uptime-kuma web UI access to your LAN or WireGuard logged-in clients
  • Sandbox containers with gVisor or Kata Containers
  • Pass wg0.conf WireGuard config or private keys as a Swarm Secret to avoid leaving credentials in plain text on the VPS host

These improvements will be the topic of a future article.

Enjoy!