Enabling Data at Rest Encryption on XCP-Ng Hypervisors With a Fully LUKS and TPM Encrypted SR
Update, 2024/08/20: Centos 7 is now EOL, which means you’ll need to update your sources to the vault repo in order to pull necessary packages; see this article: Enabling Centos Sources on XCP-ng with CentOS 7 EOL
Note: I work for Forward Systems, a Vates Partner that helps clients deploy and manage infrastructure. If you need support on deployment projects, don’t hesitate to get in touch: https://fwdsystems.tech
XCP-ng is a robust and full-featured alternative to VMware vSphere and has been my go-to hypervisor for over a year now. However, one crucial feature still missing when setting up XCP-ng hypervisors is data-at-rest encryption.
In many industries, data-at-rest encryption is a mandatory requirement to ensure compliance and minimize the risk of data leaks due to mishandling or errors. While vSphere offers Virtual Machine Encryption and vSAN Data-At-Rest Encryption, there’s no officially supported method to handle encryption in XCP-ng or Xen Orchestra out of the box.
Fortunately, since XCP-ng runs on CentOS 7 and Linux has excellent support for encryption, we can leverage existing software to add full disk encryption support to the Storage Repository (SR) level. This blog post will guide you through the process of implementing full Storage Repository (SR) encryption with TPM-backed LUKS.
Why Implement Data-at-Rest Encryption?
- Compliance: Certain industries mandate that all data-at-rest be encrypted. Hosting VMs on unencrypted local storage would not be compliant without additional measures.
- Data Security: By encrypting all data on a drive by default, it minimizes the risks of data leaks due to mishandling or errors, such as the case of Scaleway losing track of hypervisor drives and exposing customer data. Implementing data-at-rest encryption meets specific threat models, ensuring that even if a drive needs to be RMA’d or gets lost, the data remains secure.
The Solution: LUKS, Clevis, and TPM
The approach we’ll follow is implementing full SR, TPM-backed encryption using LUKS, Clevis, and systemd. Here are the key advantages of this solution:
- Seamless: No password prompt on boot, but you can still fall back to a password prompt as a backup using multiple key slots in the TPM stops functioning or the drives needs to be moved to a different system.
- No Plaintext Passwords: Keys are held by the TPM, eliminating the risk of exposing plaintext passwords that hang around on drives.
- Limited Points of Failure: No complex dependencies, reducing the risk of issues during boot or operation.
- Cost-effective: TPM chips are inexpensive, typically around $20-$30.
- Meets Threat Model: Data-at-rest in the SR is encrypted by default, mitigating the risk of data exposure if a drive needs to be RMA’d or gets lost. It does not safeguard against an attacker having physical access to the server, but not much does anyway.
TLDR;
- Install
clevis
andclevis-luks
- Create a LUKS volume and bind it to the TPM
- Add a
systemd
service that will unlock the volume on boot - Create your SR into that volume
Setup and Parameters Decisions
Clevis
Bind Parameters
When executing the clevis luks bind
command, a number of parameters are available - full list - and affect the conditions that need to be met for the drive to unlock. '{}'
contains the configuration in that command.
We decided not to specify any parameter, as even with no parameters, the drive cannot be decrypted from another compute - unless the attacker knows the backup password we’ll install on keyslot 0. We consider this to be sufficient, as measures such as sealing the LUKS key against the UEFI settings don’t bring benefits considering our threat model, but carry a large risk of additional complications that could prevent the volume from unlocking. We also choose to utilize the default hashing, key, and pcr_bank settings, as we consider them sufficiently secure.
systemd
Unlock
The auto-unlock on boot is implemented through a custom, simple systemd
template.
There are a couple of reasons for that:
- We would have liked to leverage
systemd-cryptenroll
, introduced withsystemd
v248; however, XCP-ng being based on CentOS 7, only carries v219. - We couldn’t make
/etc/crypttab
function reliably. - The same thing with
dracut
, which is better suited for early boot unlocking.
Step-by-Step Guide
1. Install Clevis
Before proceeding, ensure that your TPM chip is available to the operating system by checking /dev/tpm*
.
$ ls -l /dev/tpm*
crw------- 1 root root 10, 224 Jun 2 11:10 /dev/tpm0
crw------- 1 root root 254, 65536 Jun 2 11:10 /dev/tpmrm0
Then, install the clevis
and clevis-luks
packages from the CentOS 7 repository.
$ yum install clevis clevis-luks --enablerepo=base,updates
2. Create a LUKS Volume
List available block devices using lsblk
.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 256G 0 disk
sr0 11:0 1 577M 0 rom
sda 8:0 0 128G 0 disk
ββsda4 8:4 0 512M 0 part /boot/efi
ββsda2 8:2 0 18G 0 part
ββsda5 8:5 0 4G 0 part /var/log
ββsda3 8:3 0 86.5G 0 part
β ββXSLocalEXT--ecd9c7bd--688d--a3ae--7828--93ef9f886e1e-ecd9c7bd--688d--a3ae--7828--93ef9f886e1e 253:2 0 86.5G 0 lvm /run/sr-mount/ecd9c7bd-688d-a3ae-7828-93ef9f886e1e
ββsda1 8:1 0 18G 0 part /
ββsda6 8:6 0 1G 0 part [SWAP]
Then create a LUKS volume on the desired device (e.g., /dev/sdb
):
$ cryptsetup luksFormat -s 512 -h sha512 -i 10000 /dev/sdb
Bind the LUKS volume to the TPM using the clevis luks bind
command. We’ll use the default settings, as they are sufficiently secure for our threat model:
$ clevis luks bind -d /dev/sdb tpm2 '{}'
Confirm that you have two keys in slot 0 and slot 1 using cryptsetup luksDump /dev/sdb
.
$ cryptsetup luksDump /dev/sdb
LUKS header information for /dev/sdb
Version: 1
Cipher name: aes
Cipher mode: xts-plain64
Hash spec: sha512
Payload offset: 4096
MK bits: 512
MK digest: 3f 3a a4 23 91 7d 2f a4 79 6a aa 03 a2 9f dc a0 ed e9 07 cb
MK salt: 83 75 a9 e2 d0 69 6e 06 86 b8 d1 fe 28 22 c1 f8
c7 0a 6b e5 23 4f 5e 39 5e 0f 38 b9 de da bc 35
MK iterations: 447500
UUID: 2b597213-af9f-484d-bcb5-5f1a2ed09910
Key Slot 0: ENABLED
Iterations: 3620927
Salt: fb a7 7f 25 e2 e0 38 cd 87 63 98 dd 15 e0 01 78
8c 98 de 0b ec ba f5 52 cf 4a 5d aa 7b eb 87 9c
Key material offset: 8
AF stripes: 4000
Key Slot 1: ENABLED
Iterations: 560136
Salt: d3 d2 46 23 18 00 75 be 3d 9b 4a 90 11 32 ff 55
fa 7e b6 55 3a 3d d4 0c 39 55 8e dd ba a0 9d 5f
Key material offset: 512
AF stripes: 4000
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED
3. Add a systemd Template to Unlock LUKS Volumes on Boot
Create a systemd template file /etc/systemd/system/[email protected]
with the following content:
[Unit]
Description=Unlock LUKS device with UUID %I on boot
After=network.target
Before=local-fs.target
[Service]
Type=oneshot
ExecStart=/usr/bin/clevis-luks-unlock -d /dev/disk/by-uuid/%i %I-luks
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
Identify the UUID of your LUKS volume using blkid /dev/sdb
, then reload systemd and enable the service for your volume:
$ systemctl daemon-reload
$ systemctl enable clevis-luks-unlock@[luks_volume_uuid].service
$ systemctl start clevis-luks-unlock@[luks_volume_uuid].service
Confirm that the volume is properly mapped by checking /dev/mapper/
.
4. Set up the Encrypted Storage Repository
Obtain the host-uuid
of your XCP-ng host using xe host-list
.
$ xe host-list
uuid ( RO) : 452677ba-b5a5-9999-8d6a-999f0732f4bb
name-label ( RW) : xcpng-nested-1
name-description ( RW): Default install
Then create the SR using the appropriate command for your desired filesystem type (e.g., ext4
).
$ xe sr-create host-uuid=452677ba-b5a5-9999-8d6a-999f0732f4bb \
type=ext \
content-type=user \
name-label="local ext luks" \
device-config:device=/dev/mapper/luks-2b597213-af9f-484d-bcb5-5f1a2ed09910
Verify that the SR is registered correctly using xe sr-list
.
$ xe sr-list
uuid ( RO) : d75cda98-bb7a-b3c7-dc5a-d2fd019c4032
name-label ( RW): local ext luks
name-description ( RW):
host ( RO): xcpng-nested-1
type ( RO): ext
content-type ( RO): user
Upon rebooting the system, the systemd service you created will automatically unlock the LUKS volume before the SR is mounted, eliminating the need for user input. However, because the first keyslot uses a plaintext password, you’ll be prompted for the LUKS password if the TPM fails, allowing you to unlock the volume and access the SR. If you fail to provide the password, the SR will be unavailable, but the system will still boot.
Remember that this is not an officially supported method, so proceed with caution and ensure that you understand the implications and potential risks. I hope this helps, and don’t hesitate to get in touch if you have any comments or tips to share.