Linux Lxd Rsyncable Intance Backup

I am a huge fan of LXD and rely on it to run 99% of my self-hosted infrastructure (included this blog).

Once familiarized with a few command lines, it is the leanest, most straightforward virtualization solution I have used. I have run most of the popular alternatives over the years and why I am with LXD now will certainly deserve another article in the future.

Like all things - and not self-hosted only - frequent backups are a must.
If you run LXD with the ZFS backend, you can already do automated and space-efficient snapshots. But what about cloud backups?

A few weeks back, I decided to setup a cron job that would copy all my instances to an s3 compatible object storage solution. Something reliable and cost-efficient in the likes of Scaleway C14 Cold Storage, Google Cloud or even Amazon Glacier.

===

The right tools for the job

Exporting your instances to an archive format

The official documentation has a well-written section that details the range of options at your disposal to backup instances. https://linuxcontainers.org/lxd/docs/master/backup

In our use case, the lxc export command is especially interesting as it packs up into a single archive file all the components of an instance (data, config files, etc.), and importing them back to a new instance is as simple as lxc import.

This command will generate one archive per instance. You can then copy the archives in your cloud storage bucket.

We will use also these flags

  • --instance-only: as I run the backup on a weekly basis, I do not feel the need to include snapshots
  • --compression allows to chose the compression algorithm
  • -f to overwrite the archives from our previous backup
  • We will NOT use --optimized-storage to ensure that our backups could be easily restored to a server using a different type of storage backend

Optional: using pigz instead of gzip for multi-threaded compression

pigz is a parallel implementation of gzip for modern multi-processor, multi-core machines. In other words, where gzip would compress your instance relying on a single core, pigz will use as many as you have available (unless asked otherwise) and significantly speed up the whole process.

Using pigz is made possible with the flag --optimized-storage.

Here’s a comparison I ran with a server with a i3-4160 CPU (a dual-core only CPU, thus with less to gain for multi-threaded compression).

With gzip

> # time lxc export test-instance "/path/to/backup/test-backup.gz" --instance-only --compression 'gzip --rsyncable'
Backup exported successfully!
lxc export test-instance "/path/to/backup/test-backup.gz"     0.42s user 20.43s system 6% cpu 5:09.45 total

Now, the same instance with pigz

> # time lxc export test-instance "/path/to/backup/test-backup.gz" --instance-only --compression 'pigz --rsyncable'
Backup exported successfully!
lxc export test-instance "/path/to/backup/test-backup.gz"     0.41s user 22.75s system 10% cpu 3:50.82 total

pigz end up being 25% faster in this case.

--rsyncable compression flag

When I originally started this, you couldn’t pass additional arguments to pigz (or any other compression software you’d want to use), and this one example where the team actively developing LXD really shines! Barely a week later, a commit was made that as of LXD 4.3, allows passing any argument you’d like to the compression algorithm.

In this case, we are going to use the flag --rsyncable, to allow Duplicity (that relies on the rsync algorithm) to efficiently do delta-transfers (transfer only the blocks that have changed) of your instances archives.
Without this flag, Duplicity would re-upload the entire archive every single time the backup runs, wasting a huge amount of bandwidth and space.
More details on why this is needed here: https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

Duplicity

In this case, we choose to go with Duplicity to backup the archives to the cloud storage bucket, for a number of reasons:

  • built-in encryption
  • space efficient versioning through the rsync algorithm
  • very large number of storage backends supported

Putting it all together

For this to work, we need to separate sets of commands: one that exports all your instances to a temporary local directory, and one runs Duplicity over that directory and uploads your archives to the cloud.

I have borrowed part of a script from Ciberciti - credit where credit is due.

The second aspect (Duplicity) is well documented as well in various places:

My initial intention was to support French tech, and I spent days trying to make Scaleway work. Unforntunalety, on >100GB datasets it is a nightmare; whereas Gcloud is flawless and gives me 800gbps+ speeds to their Singapore datacenter! You read that right: on local fiber, the server actually can’t keep up.

In this example, we are therefore going with Gcloud.

The FROM_DIR, TO_DIR and PASSPHRASE variables will need to be adjusted.

#!/bin/bash

# Script variables
SCRIPT_NAME=$(basename -- "$0")
LOCK_FILE="/tmp/$SCRIPT_NAME.lockfile"
FROM_DIR=/Backups-LXD
TO_DIR=gs://bucket
NOW=$(date +'%m-%d-%Y')

# The password you'll use to encrypt backups
export PASSPHRASE="random-passphrase-you-shouldnt-loose"

## Dump LXD server config ##
lxd init --dump > "$FROM_DIR/lxd.config.${NOW}"

## Dump all instances list ##
lxc list > "$FROM_DIR/lxd.instances.list.${NOW}"

## Make sure we know LXD version too ##
snap list lxd > "$FROM_DIR/lxd-version.${NOW}"

## Backup all Instances
for i in $(lxc list -c n --format csv)
do
     echo "Making backup of ${i} ..."
     lxc export "${i}" "$FROM_DIR/${i}-backup.gz" --instance-only --compression 'pigz --rsyncable -f'
done

## Duplicity to Gcloud
if [ ! -e $LOCK_FILE ]; then
        touch $LOCK_FILE
        duplicity \
        --progress --asynchronous-upload --no-compression --volsize 2048 \
        --archive-dir /root/.cache/duplicity/ \
        --log-file "/var/log/$SCRIPT_NAME.`date +"%F_%T"`.log" \
        $FROM_DIR \
        $TO_DIR

        rm -f $LOCK_FILE

else
        echo "Duplicity is still running"
fi