Enabling Data at Rest Encryption on XCP-Ng Hypervisors With a Fully LUKS and TPM Encrypted SR

XCP-ng logo

XCP-ng is a robust and full-featured alternative to VMware vSphere and has been my go-to hypervisor for over a year now. However, one crucial feature still missing when setting up XCP-ng hypervisors is data-at-rest encryption.

In many industries, data-at-rest encryption is a mandatory requirement to ensure compliance and minimize the risk of data leaks due to mishandling or errors. While vSphere offers Virtual Machine Encryption and vSAN Data-At-Rest Encryption, there’s no officially supported method to handle encryption in XCP-ng or Xen Orchestra out of the box.

Fortunately, since XCP-ng runs on CentOS 7 and Linux has excellent support for encryption, we can leverage existing software to add full disk encryption support to the Storage Repository (SR) level. This blog post will guide you through the process of implementing full Storage Repository (SR) encryption with TPM-backed LUKS.

Why Implement Data-at-Rest Encryption?

  • Compliance: Certain industries mandate that all data-at-rest be encrypted. Hosting VMs on unencrypted local storage would not be compliant without additional measures.
  • Data Security: By encrypting all data on a drive by default, it minimizes the risks of data leaks due to mishandling or errors, such as the case of Scaleway losing track of hypervisor drives and exposing customer data. Implementing data-at-rest encryption meets specific threat models, ensuring that even if a drive needs to be RMA’d or gets lost, the data remains secure.

The Solution: LUKS, Clevis, and TPM

The approach we’ll follow is implementing full SR, TPM-backed encryption using LUKS, Clevis, and systemd. Here are the key advantages of this solution:

  • Seamless: No password prompt on boot, but you can still fall back to a password prompt as a backup using multiple key slots in the TPM stops functioning or the drives needs to be moved to a different system.
  • No Plaintext Passwords: Keys are held by the TPM, eliminating the risk of exposing plaintext passwords that hang around on drives.
  • Limited Points of Failure: No complex dependencies, reducing the risk of issues during boot or operation.
  • Cost-effective: TPM chips are inexpensive, typically around $20-$30.
  • Meets Threat Model: Data-at-rest in the SR is encrypted by default, mitigating the risk of data exposure if a drive needs to be RMA’d or gets lost. It does not safeguard against an attacker having physical access to the server, but not much does anyway.

TLDR;

  • Install clevis and clevis-luks
  • Create a LUKS volume and bind it to the TPM
  • Add a systemd service that will unlock the volume on boot
  • Create your SR into that volume

Setup and Parameters Decisions

Clevis Bind Parameters

When executing the clevis luks bind command, a number of parameters are available - full list - and affect the conditions that need to be met for the drive to unlock. '{}' contains the configuration in that command.

We decided not to specify any parameter, as even with no parameters, the drive cannot be decrypted from another compute - unless the attacker knows the backup password we’ll install on keyslot 0. We consider this to be sufficient, as measures such as sealing the LUKS key against the UEFI settings don’t bring benefits considering our threat model, but carry a large risk of additional complications that could prevent the volume from unlocking. We also choose to utilize the default hashing, key, and pcr_bank settings, as we consider them sufficiently secure.

systemd Unlock

The auto-unlock on boot is implemented through a custom, simple systemd template. There are a couple of reasons for that:

  • We would have liked to leverage systemd-cryptenroll, introduced with systemd v248; however, XCP-ng being based on CentOS 7, only carries v219.
  • We couldn’t make /etc/crypttab function reliably.
  • The same thing with dracut, which is better suited for early boot unlocking.

Step-by-Step Guide

1. Install Clevis

Before proceeding, ensure that your TPM chip is available to the operating system by checking /dev/tpm*.

$ ls -l /dev/tpm*
crw------- 1 root root  10,   224 Jun  2 11:10 /dev/tpm0
crw------- 1 root root 254, 65536 Jun  2 11:10 /dev/tpmrm0

Then, install the clevis and clevis-luks packages from the CentOS 7 repository.

$ yum install clevis clevis-luks --enablerepo=base,updates

2. Create a LUKS Volume

List available block devices using lsblk.

$ lsblk
NAME                                                                                              MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sdb                                                                                                 8:16   0  256G  0 disk  
sr0                                                                                                11:0    1  577M  0 rom   
sda                                                                                                 8:0    0  128G  0 disk  
├─sda4                                                                                              8:4    0  512M  0 part  /boot/efi
├─sda2                                                                                              8:2    0   18G  0 part  
├─sda5                                                                                              8:5    0    4G  0 part  /var/log
├─sda3                                                                                              8:3    0 86.5G  0 part  
│ └─XSLocalEXT--ecd9c7bd--688d--a3ae--7828--93ef9f886e1e-ecd9c7bd--688d--a3ae--7828--93ef9f886e1e 253:2    0 86.5G  0 lvm   /run/sr-mount/ecd9c7bd-688d-a3ae-7828-93ef9f886e1e
├─sda1                                                                                              8:1    0   18G  0 part  /
└─sda6                                                                                              8:6    0    1G  0 part  [SWAP]

Then create a LUKS volume on the desired device (e.g., /dev/sdb):

$ cryptsetup luksFormat -s 512 -h sha512 -i 10000 /dev/sdb

Bind the LUKS volume to the TPM using the clevis luks bind command. We’ll use the default settings, as they are sufficiently secure for our threat model:

$ clevis luks bind -d /dev/sdb tpm2 '{}'

Confirm that you have two keys in slot 0 and slot 1 using cryptsetup luksDump /dev/sdb.

$ cryptsetup luksDump /dev/sdb
LUKS header information for /dev/sdb

Version:       	1
Cipher name:   	aes
Cipher mode:   	xts-plain64
Hash spec:     	sha512
Payload offset:	4096
MK bits:       	512
MK digest:     	3f 3a a4 23 91 7d 2f a4 79 6a aa 03 a2 9f dc a0 ed e9 07 cb 
MK salt:       	83 75 a9 e2 d0 69 6e 06 86 b8 d1 fe 28 22 c1 f8 
               	c7 0a 6b e5 23 4f 5e 39 5e 0f 38 b9 de da bc 35 
MK iterations: 	447500
UUID:          	2b597213-af9f-484d-bcb5-5f1a2ed09910

Key Slot 0: ENABLED
	Iterations:         	3620927
	Salt:               	fb a7 7f 25 e2 e0 38 cd 87 63 98 dd 15 e0 01 78 
	                      	8c 98 de 0b ec ba f5 52 cf 4a 5d aa 7b eb 87 9c 
	Key material offset:	8
	AF stripes:            	4000
Key Slot 1: ENABLED
	Iterations:         	560136
	Salt:               	d3 d2 46 23 18 00 75 be 3d 9b 4a 90 11 32 ff 55 
	                      	fa 7e b6 55 3a 3d d4 0c 39 55 8e dd ba a0 9d 5f 
	Key material offset:	512
	AF stripes:            	4000
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

3. Add a systemd Template to Unlock LUKS Volumes on Boot

Create a systemd template file /etc/systemd/system/clevis-luks-unlock@.service with the following content:

[Unit]
Description=Unlock LUKS device with UUID %I on boot
After=network.target
Before=local-fs.target

[Service]
Type=oneshot
ExecStart=/usr/bin/clevis-luks-unlock -d /dev/disk/by-uuid/%i %I-luks
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

Identify the UUID of your LUKS volume using blkid /dev/sdb, then reload systemd and enable the service for your volume:

$ systemctl daemon-reload
$ systemctl enable clevis-luks-unlock@[luks_volume_uuid].service
$ systemctl start clevis-luks-unlock@[luks_volume_uuid].service

Confirm that the volume is properly mapped by checking /dev/mapper/.

4. Set up the Encrypted Storage Repository

Obtain the host-uuid of your XCP-ng host using xe host-list.

$ xe host-list
uuid ( RO)                : 452677ba-b5a5-9999-8d6a-999f0732f4bb
    name-label ( RW)      : xcpng-nested-1
    name-description ( RW): Default install

Then create the SR using the appropriate command for your desired filesystem type (e.g., ext4).

$ xe sr-create host-uuid=452677ba-b5a5-9999-8d6a-999f0732f4bb \
type=ext \
content-type=user \
name-label="local ext luks" \
device-config:device=/dev/mapper/luks-2b597213-af9f-484d-bcb5-5f1a2ed09910

Verify that the SR is registered correctly using xe sr-list.

$ xe sr-list 
uuid ( RO)                : d75cda98-bb7a-b3c7-dc5a-d2fd019c4032
          name-label ( RW): local ext luks
    name-description ( RW): 
                host ( RO): xcpng-nested-1
                type ( RO): ext
        content-type ( RO): user

Upon rebooting the system, the systemd service you created will automatically unlock the LUKS volume before the SR is mounted, eliminating the need for user input. However, because the first keyslot uses a plaintext password, you’ll be prompted for the LUKS password if the TPM fails, allowing you to unlock the volume and access the SR. If you fail to provide the password, the SR will be unavailable, but the system will still boot.

Remember that this is not an officially supported method, so proceed with caution and ensure that you understand the implications and potential risks. I hope this helps, and don’t hesitate to get in touch if you have any comments or tips to share.