ZFS - zfs

ZFS combines both physical volume management and a file system. A ZFS installation can span across a series of storage devices and is very scalable, allowing you to add disks to expand the available space in the storage pool immediately.

ZFS is a block-based file system that protects against data corruption by using checksums to verify, confirm and correct every operation. To run at a sufficient speed, this mechanism requires a powerful environment with a lot of RAM.

In addition, ZFS offers snapshots and replication, RAID management, copy-on-write clones, compression and other features.

To use ZFS, make sure you have zfsutils-linux installed on your machine.

Terminology

ZFS creates logical units based on physical storage devices. These logical units are called ZFS pools or zpools. Each zpool is then divided into a number of datasets. These datasets can be of different types:

  • A ZFS filesystem can be seen as a partition or a mounted file system.

  • A ZFS volume represents a block device.

  • A ZFS snapshot captures a specific state of either a ZFS filesystem or a ZFS volume. ZFS snapshots are read-only.

  • A ZFS clone is a writable copy of a ZFS snapshot.

zfs driver in LXD

The zfs driver in LXD uses ZFS filesystems and ZFS volumes for images and custom storage volumes, and ZFS snapshots and clones to create instances from images and for instance and custom volume snapshots. By default, LXD enables compression when creating a ZFS pool.

LXD assumes that it has full control over the ZFS pool and dataset. Therefore, you should never maintain any datasets or file system entities that are not owned by LXD in a ZFS pool or dataset, because LXD might delete them.

Due to the way copy-on-write works in ZFS, parent ZFS filesystems can’t be removed until all children are gone. As a result, LXD automatically renames any objects that are removed but still referenced. Such objects are kept at a random deleted/ path until all references are gone and the object can safely be removed. Note that this method might have ramifications for restoring snapshots. See Limitations below.

LXD automatically enables trimming support on all newly created pools on ZFS 0.8 or later. This increases the lifetime of SSDs by allowing better block re-use by the controller, and it also allows to free space on the root file system when using a loop-backed ZFS pool. If you are running a ZFS version earlier than 0.8 and want to enable trimming, upgrade to at least version 0.8. Then use the following commands to make sure that trimming is automatically enabled for the ZFS pool in the future and trim all currently unused space:

zpool upgrade ZPOOL-NAME
zpool set autotrim=on ZPOOL-NAME
zpool trim ZPOOL-NAME

Limitations

The zfs driver has the following limitations:

Restoring from older snapshots

ZFS doesn’t support restoring from snapshots other than the latest one. You can, however, create new instances from older snapshots. This method makes it possible to confirm whether a specific snapshot contains what you need. After determining the correct snapshot, you can remove the newer snapshots so that the snapshot you need is the latest one and you can restore it.

Alternatively, you can configure LXD to automatically discard the newer snapshots during restore. To do so, set the zfs.remove_snapshots configuration for the volume (or the corresponding volume.zfs.remove_snapshots configuration on the storage pool for all volumes in the pool).

Note, however, that if zfs.clone_copy is set to true, instance copies use ZFS snapshots too. In that case, you cannot restore an instance to a snapshot taken before the last copy without having to also delete all its descendants. If this is not an option, you can copy the wanted snapshot into a new instance and then delete the old instance. You will, however, lose any other snapshots the instance might have had.

Observing I/O quotas

I/O quotas are unlikely to affect ZFS filesystems very much. That’s because ZFS is a port of a Solaris module (using SPL) and not a native Linux file system using the Linux VFS API, which is where I/O limits are applied.

Feature support in ZFS

Some features, like the use of idmaps or delegation of a ZFS dataset, require ZFS 2.2 or higher and are therefore not widely available yet.

Quotas

ZFS provides two different quota properties: quota and refquota. quota restricts the total size of a dataset, including its snapshots and clones. refquota restricts only the size of the data in the dataset, not its snapshots and clones.

By default, LXD uses the quota property when you set up a size/quota for your storage volume. If you want to use the refquota property instead, set the zfs.use_refquota configuration for the volume (or the corresponding volume.zfs.use_refquota configuration on the storage pool for all volumes in the pool).

You can also set the zfs.reserve_space (or volume.zfs.reserve_space) configuration to use ZFS reservation or refreservation along with quota or refquota.

Configuration options

The following configuration options are available for storage pools that use the zfs driver and for storage volumes in these pools.

Storage pool configuration

size

Size of the storage pool (for loop-based pools)

Key: size
Type:

string

Default:

auto (20% of free disk space, >= 5 GiB and <= 30 GiB)

Scope:

local

When creating loop-based pools, specify the size in bytes (suffixes are supported). You can increase the size to grow the storage pool.

The default (auto) creates a storage pool that uses 20% of the free disk space, with a minimum of 5 GiB and a maximum of 30 GiB.

source

Path to an existing block device, loop file, or ZFS dataset/pool

Key: source
Type:

string

Scope:

local

source.wipe

Whether to wipe the block device before creating the pool

Key: source.wipe
Type:

bool

Default:

false

Scope:

local

Set this option to true to wipe the block device specified in source prior to creating the storage pool.

zfs.clone_copy

Whether to use ZFS lightweight clones

Key: zfs.clone_copy
Type:

string

Default:

true

Scope:

global

Set this option to true or false to enable or disable using ZFS lightweight clones rather than full dataset copies. Set the option to rebase to copy based on the initial image.

zfs.export

Disable zpool export while an unmount is being performed

Key: zfs.export
Type:

bool

Default:

true

Scope:

global

zfs.pool_name

Name of the zpool

Key: zfs.pool_name
Type:

string

Default:

name of the pool

Scope:

local

Tip

In addition to these configurations, you can also set default values for the storage volume configurations. See Configure default values for storage volumes.

Storage volume configuration

block.filesystem

File system of the storage volume

Key: block.filesystem
Type:

string

Default:

same as volume.block.filesystem

Condition:

block-based volume with content type filesystem (zfs.block_mode enabled)

Scope:

global

Valid options are: btrfs, ext4, xfs If not set, ext4 is assumed.

block.mount_options

Mount options for block-backed file system volumes

Key: block.mount_options
Type:

string

Default:

same as volume.block.mount_options

Condition:

block-based volume with content type filesystem (zfs.block_mode enabled)

Scope:

global

security.shared

Enable volume sharing

Key: security.shared
Type:

bool

Default:

same as volume.security.shared or false

Condition:

custom block volume

Scope:

global

Enabling this option allows sharing the volume across multiple instances despite the possibility of data loss.

security.shifted

Enable ID shifting overlay

Key: security.shifted
Type:

bool

Default:

same as volume.security.shifted or false

Condition:

custom volume

Scope:

global

Enabling this option allows attaching the volume to multiple isolated instances.

security.unmapped

Disable ID mapping for the volume

Key: security.unmapped
Type:

bool

Default:

same as volume.security.unmappped or false

Condition:

custom volume

Scope:

global

size

Size/quota of the storage volume

Key: size
Type:

string

Default:

same as volume.size

Condition:

appropriate driver

Scope:

local

snapshots.expiry

When snapshots are to be deleted

Key: snapshots.expiry
Type:

string

Default:

same as volume.snapshots.expiry

Condition:

custom volume

Scope:

global

Specify an expression like 1M 2H 3d 4w 5m 6y.

snapshots.pattern

Template for the snapshot name

Key: snapshots.pattern
Type:

string

Default:

same as volume.snapshots.pattern or snap%d

Condition:

custom volume

Scope:

global

You can specify a naming template that is used for scheduled snapshots and unnamed snapshots.

The snapshots.pattern option takes a Pongo2 template string to format the snapshot name.

To add a time stamp to the snapshot name, use the Pongo2 context variable creation_date. Make sure to format the date in your template string to avoid forbidden characters in the snapshot name. For example, set snapshots.pattern to {{ creation_date|date:'2006-01-02_15-04-05' }} to name the snapshots after their time of creation, down to the precision of a second.

Another way to avoid name collisions is to use the placeholder %d in the pattern. For the first snapshot, the placeholder is replaced with 0. For subsequent snapshots, the existing snapshot names are taken into account to find the highest number at the placeholder’s position. This number is then incremented by one for the new name.

snapshots.schedule

Schedule for automatic volume snapshots

Key: snapshots.schedule
Type:

string

Default:

same as snapshots.schedule

Condition:

custom volume

Scope:

global

Specify either a cron expression (<minute> <hour> <dom> <month> <dow>), a comma-separated list of schedule aliases (@hourly, @daily, @midnight, @weekly, @monthly, @annually, @yearly), or leave empty to disable automatic snapshots (the default).

volatile.uuid

The volume’s UUID

Key: volatile.uuid
Type:

string

Default:

random UUID

Scope:

global

zfs.block_mode

Whether to use a formatted zvol rather than a dataset

Key: zfs.block_mode
Type:

bool

Default:

same as volume.zfs.block_mode

Scope:

global

zfs.block_mode can be set only for custom storage volumes. To enable ZFS block mode for all storage volumes in the pool, including instance volumes, use volume.zfs.block_mode.

zfs.blocksize

Size of the ZFS block

Key: zfs.blocksize
Type:

string

Default:

same as volume.zfs.blocksize

Scope:

global

The size must be between 512 bytes and 16 MiB and must be a power of 2. For a block volume, a maximum value of 128 KiB will be used even if a higher value is set.

Depending on the value of zfs.block_mode, the specified size is used to set either volblocksize or recordsize in ZFS.

zfs.delegate

Whether to delegate the ZFS dataset

Key: zfs.delegate
Type:

bool

Default:

same as volume.zfs.delegate

Condition:

ZFS 2.2 or higher

Scope:

global

This option controls whether to delegate the ZFS dataset and anything underneath it to the container or containers that use it. When used in conjunction with security.nesting, this allows using the zfs command in the container.

zfs.remove_snapshots

Remove snapshots as needed

Key: zfs.remove_snapshots
Type:

bool

Default:

same as volume.zfs.remove_snapshots or false

Scope:

global

zfs.reserve_space

Use reservation/refreservation along with quota/refquota

Key: zfs.reserve_space
Type:

bool

Default:

same as volume.zfs.reserve_space or false

Scope:

global

zfs.use_refquota

Use refquota instead of quota for space

Key: zfs.use_refquota
Type:

bool

Default:

same as volume.zfs.use_refquota or false

Scope:

global

Storage bucket configuration

To enable storage buckets for local storage pool drivers and allow applications to access the buckets via the S3 protocol, you must configure the core.storage_buckets_address server setting.

size

Size/quota of the storage bucket

Key: size
Type:

string

Default:

same as volume.size

Condition:

appropriate driver

Scope:

local