Hybrid Infrastructure Uptime Monitoring With Uptime Kuma, Traefik, WireGuard, and Docker
Building health check and monitoring into your DevOps infrastructure is essential for ensuring the reliability and availability of services. By continuously monitoring the performance, status, and uptime of services, you are able to quickly identify and resolve issues before they impact your end-users and avoid silent failures that can quickly engender a snowball effect later on.
In this short article, we will leverage Uptime Kuma, traefik, WireGuard and Docker to effortlessly monitor a hydrid, multi-site or multi-cloud infrastructure.
Assumption & Goals
- You have a hybrid infrastructure or a multi-cloud setup
- You’d like to monitor services and/or host uptime to promptly catch failures as they happen
- Monitor both LAN and WAN resources
- You already have a WireGuard server setup on-prem
- The monitoring setup should be secure and cost-effective
Approach
- Cloud VPS or independent host: monitor your on-premises infrastructure from the cloud or form another site. This external monitoring point will allow for more accurate insight into the availability of your services from different locations and provide a failsafe for detecting outages that may affect the wider network.
- WireGuard VPN:Configure a secure access over WAN to your on-premises LAN to monitor resources sitting behind a firewall. By utilizing a secure VPN like WireGuard, you can establish a protected connection between your monitoring system and the resources you want to monitor, ensuring that your monitoring traffic remains encrypted and private.
- Traffic routing: WAN resources should still be monitored from publicly accessible WAN IP addresses. This will help you understand the availability and reliability of your external-facing services from various geographic locations and enable faster detection of any issues affecting your users’ experience.
- SSL reverse-proxy to securely access the monitoring webUI.
- Access controls, ACLs, and VLAN traversal: must be enforced by the on-premises firewall. This is out of the scope of this article. Apply least-privilege principles.
- Restrict access to designated services (ICMP) and ports only.
- Treat the monitoring host as a bastion and severely harden it.
- Ideally, use a host that’s not public facing.
Components
Softwares of choice:
- Uptime Kuma on Docker (VPS)
- Traefik reverse-proxy on Docker (VPS)
- WireGuard VPN on Docker (VPS)
- WireGuard running on your on-premises firewall (I use OPNsense)
The secret sauce:
network_mode: service:wireguard
to routeuptime-kuma
traffic through our WireGuard containerAllowIPs
configured for LAN access only; ensure only traffic destined to private addresses goes through our WireGuard tunnel, and take out the need for complicated iptables
This setup is very lightweight and I’ve deployed it successfully to monitor 100+ ressources using a simple free-tier 1Gb Oracle VPS.
In Practice
Network Topology
For the sake of the example configurations below, we will assume the following two subnets:
- LAN to monitor: 192.168.50.0/24
- VPN VLAN: 192.168.15.0/24
Example diagram
WireGuard Configuration
We assume the below WireGuard site-to-site configuration, under ./wg-config/wg0.conf
.
[Interface]
PrivateKey = 123ABC
Address = 192.168.15.2/32
DNS = 192.168.15.1
[Peer]
PublicKey = 456DEF
AllowedIPs = 192.168.50.0/24, 192.168.15.0/24
Endpoint = example.com:51820
The AllowedIPs
configuration defines the subnets that will route through WireGuard.
You could provide a fine comma-separated list as in the example above, or a wider subnet such as 192.168.0.0/16
. Using 0.0.0.0/0
will route your entire uptime-kuma
traffic through WireGuard instead.
Docker Compose
As mentioned before, the key here is to define our uptime-kuma
networking property as network_mode: service:wireguard
so that traffic will be routed through that container, which will then decide if it should go out from the tunnel or the WAN interface.
Our Docker Compose file includes the following services:
- Uptime Kuma
- Traefik (reverse proxy with SSL)
- WireGuard
This is also an opinionated config where we limit resources available to our containers.
version: "3.8"
services:
wireguard:
image: lscr.io/linuxserver/wireguard:latest
container_name: wireguard
cap_add:
- NET_ADMIN
- SYS_MODULE #optional
environment:
- PUID=1001
- PGID=1001
- TZ=Asia/Singapore
volumes:
- ./wg-config:/config
- /lib/modules:/lib/modules #optional
sysctls:
- net.ipv4.conf.all.src_valid_mark=1
restart: unless-stopped
deploy:
resources:
limits:
cpus: '1'
memory: 256m
traefik:
image: "traefik:v2.10"
container_name: "traefik"
command:
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.websecure.address=:443"
- "--certificatesresolvers.myresolver.acme.tlschallenge=true"
- "[email protected]"
- "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
ports:
- "443:443"
- "8080:8080"
volumes:
- "./letsencrypt:/letsencrypt"
- "/var/run/docker.sock:/var/run/docker.sock:ro"
deploy:
resources:
limits:
cpus: '1'
memory: 256m
uptime-kuma:
image: louislam/uptime-kuma:latest
container_name: uptime-kuma
volumes:
- ./uptime-kuma/data:/app/data
restart: unless-stopped
deploy:
resources:
limits:
cpus: '1'
memory: 256m
labels:
- "traefik.enable=true"
- "traefik.http.routers.uptime-kuma.rule=Host(`monitoring.example.com`)"
- "traefik.http.routers.uptime-kuma.entrypoints=websecure"
- "traefik.http.routers.uptime-kuma.tls.certresolver=myresolver"
- "traefik.http.services.uptime-kuma.loadbalancer.server.port=3001"
network_mode: service:wireguard
depends_on:
wireguard:
condition: service_started
traefik:
condition: service_started
Check Routing & Troubleshooting
We can check if the LAN is reachable from the uptime-kuma
container by pinging our WireGuard gateway. If you can reach the gateway but encouter issues with reaching other services, something’s up with your firewall rules.
> $ docker exec -it uptime-kuma ping -c 4 192.168.50.1
PING 192.168.50.1 (192.168.50.1) 56(84) bytes of data.
64 bytes from 192.168.50.1: icmp_seq=1 ttl=64 time=3.63 ms
64 bytes from 192.168.50.1: icmp_seq=2 ttl=64 time=2.50 ms
64 bytes from 192.168.50.1: icmp_seq=3 ttl=64 time=2.49 ms
64 bytes from 192.168.50.1: icmp_seq=4 ttl=64 time=2.49 ms
--- 192.168.50.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 2.491/2.779/3.632/0.496 ms
Public websites are equally reachable from the uptime-kuma
container:
> $ docker exec -it uptime-kuma ping -c 4 icanhazip.com
PING icanhazip.com (104.18.114.97) 56(84) bytes of data.
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=1 ttl=60 time=1.15 ms
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=2 ttl=60 time=1.26 ms
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=3 ttl=60 time=1.28 ms
64 bytes from 104.18.114.97 (104.18.114.97): icmp_seq=4 ttl=60 time=1.23 ms
We can also confirm that public-facing services are monitored through the WAN interface. The IP address that comes back should belong to your WAN interface, and NOT to your on-premises gateway.
> $ docker exec -it uptime-kuma curl ipv4.icanhazip.com
158.178.1.1
Security Improvements
Mitigate possible vulnerabilities in uptime-kuma
or other programs running on the host:
- Harden the VPS host
- Restric VLAN traversal from the VPN VLAN to ICMP only
- Restrict
uptime-kuma
web UI access to your LAN or WireGuard logged-in clients - Sandbox containers with
gVisor
orKata Containers
- Pass
wg0.conf
WireGuard config or private keys as a Swarm Secret to avoid leaving credentials in plain text on the VPS host
These improvements will be the topic of a future article.
Enjoy!