Eric Radman : a Journal

ZFS Quickstart

ZFS is not only a full-featured file system, it is also handles volume discovery, RAID, and network access.

Rocky/Alma Linux Install

dnf install https://zfsonlinux.org/epel/zfs-release-2-2.el9.noarch.rpm
dnf config-manager --disable zfs
dnf config-manager --enable zfs-kmod
dnf install zfs
echo zfs > /etc/modules-load.d/zfs.conf

See also OpenZFS RHEL install.

FreeBSD

# /etc/rc.conf
zfs_enable="YES"

ZFS does not require a partition table, but initializing a disk with GUID partition map will avoid spurious warnings and make it clear what kind of file system is on a device

geom disk list                 # List block devices
gpart destroy -F nda1          # Delete partition data
gpart create -s gpt nda1       # New GUID partition table
gpart add -t freebsd-zfs nda1  # Create and label partition

Create zpool and new volume on first partition

zpool create -O compression=lz4 zpool2 /dev/nda1p1
zfs create -o mountpoint=/ci zpool2/ci

Automated Snapshot Managment

To automate snapshot retention make a new periodic snapshot and prune anything that is more than N days old

#!/bin/sh -e

today=$(date +"%Y-%m-%d")

for fs in zpool2/ci; do
   zfs snapshot $fs@$today

   for snap in $(zfs list -t snapshot -H -o name $fs | sort -r | tail +30); do
      zfs destroy $snap
   done
done

Run daily

15  20  *  *  * /usr/local/bin/zfs-snap.sh

NFS Export

While any local mount can be added to /etc/exports ZFS sharing allows mount points to be automatically by setting the shrenfs property on each volume

$ doas zfs sharenfs='-network 192.168.2.0/24' zpool2/ci
$ zfs get sharenfs zpool2/ci
NAME       PROPERTY  VALUE                    SOURCE
zpool2/ci  sharenfs  -network 192.168.2.0/24  local

To unshare a pool or volume

$ doas zfs set sharenfs=off zpool2

On FreeBSD, rpcbind must also be enabled. Optionally allow clients so connect without the resvport option.

# /etc/rc.conf
rpcbind_enable="YES"
nfs_reserved_port_only="NO"

Virtual Machines

When running KVM or Bhyave, ZFS can provide a device that can be attached to a virtual machine directly. This is referred to as a zvol

zfs create -sV 100G -o volmode=dev zpool2/vm/mykube2

Encryption

For manual unlock

zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase zroot/home
zfs set setuid=off zroot/home
zfs set devices=off zroot/home
zfs set mountpoint=/home zroot/home

Then in rc.local

zfs load-key -r zroot/home
zfs mount zroot/home

Import all Pools after Reinstall

zpool import -a

Formulas

Create RAID-1 pool

zpool create zpool0 /dev/nda0p1
zpool attach zpool0 /dev/nda0p1 /dev/nda1p1
zpool status

Add spare

zpool add zpool0 spare /dev/nda2p1

Replace disk

zpool replace zpool0 /dev/nda1p1 /dev/nda2p1
zpool detach zpool0 /dev/nda1p1
zpool add zpool0 spare /dev/nda1p1

Create RAID-Z pool

zpool create zpool0 raidz /dev/nda0p1 /dev/nda1p1 /dev/nda2p1

Tuning for PostgreSQL

The OpenZFS Workload Tuning pages indicates that full_page_writes can be disabled since there is not need to guard against torn pages on ZFS. This parameter will likely lead to corruption if the database is replicated to a non-ZFS volume.

Configuring Oracle Solaris ZFS for an Oracle Database (September 2020) provides a detailed guide that could be translated for PostgreSQL

recordsize logbias primarycache compression
Tables 32K latency all (data and metadata) LZ4
Redo 128K latency Do not use off (default)
Index 32K throughput all (data and metadata) off (default)
Undo 1 MB throughput all (data and metadata) off (default)
Temp 128K latency Do not use off (default)
Archive 1 MB throughput Do not use LZ4

From this we can also see a problem: these options are very dificult to validate, and this complexity can easily make obtaining a consensus view within a team impossible.