Eric Radman : a Journal

Bhyve and iPXE

In software or systems engineering, the ability to spin up virtual machines is a valuable capability. Virtual machines provide a means of testing software or configuration in on multiple platforms, and in multiple configurations.

In production, a high-performance and reliable hypervisor allows services to be provisioned and updated one at a time. This capability avoids the added risk and complexity of fork-lift upgrades.

Bhyve is somewhat similar QEMU-KVM without the emulation required to boot SeaBIOS. Native UEFI support plus iPXE provides some very flexible means of bootstrapping VMs.

Boot Image

iPXE is a very capable UEFI application with very obscure documentation. Specifically, it is a challenging to determine what configuration iPXE it tries to find by default—the Internet is full of incorrect answers. strings(1) resolved this question!

$ strings /usr/local/share/ipxe/snp.efi-x86_64 | egrep '.ipxe$'
#!ipxe
autoexec.ipxe

Bhyve does not have a built-in PXE boot support, but we can build our own boot image which will be added asn an nvme device when the VM is started

#!ipxe
dhcp && goto netboot || goto dhcperror

:dhcperror
prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

:netboot
chain tftp://${next-server}/chainload/${hostname}.ipxe ||
prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

The trick in this configuration is that iPXE will fetch further configuration based on next-server and host-name DHCP options.

#!/bin/sh

dst=uefi-ipxe-chainload.img

truncate -s 4M $dst
mdconfig -a -t vnode -u 99 -f $dst
gpart create -s gpt /dev/md99
gpart add -t efi /dev/md99
newfs_msdos -F12 /dev/md99p1
mount -t msdosfs -o longnames /dev/md99p1 /mnt

mkdir -p /mnt/EFI/Boot
cp /usr/local/share/ipxe/snp.efi-x86_64 /mnt/EFI/Boot/BootX64.efi
cp autoexec.ipxe /mnt/autoexec.ipxe

umount /mnt
mdconfig -du md99

Process Status

In Bhyve, each virtual machine is a process, and each virtual CPU is a thread which may be listed using top -H or procstat(1)

# procstat -t 4470
  PID    TID COMM                TDNAME              CPU  PRI STATE   WCHAN
 4470 100259 bhyve               mevent               -1  120 sleep   kqread
 4470 100510 bhyve               blk-4:0-0            -1  120 sleep   uwait
 4470 100511 bhyve               blk-4:0-1            -1  120 sleep   uwait
 4470 100512 bhyve               blk-4:0-2            -1  120 sleep   uwait
 4470 100513 bhyve               blk-4:0-3            -1  120 sleep   uwait
 4470 100514 bhyve               blk-4:0-4            -1  120 sleep   uwait
 4470 100515 bhyve               blk-4:0-5            -1  120 sleep   uwait
 4470 100516 bhyve               blk-4:0-6            -1  120 sleep   uwait
 4470 100517 bhyve               blk-4:0-7            -1  120 sleep   uwait
 4470 100518 bhyve               vtnet-5:0 tx         -1  120 sleep   uwait
 4470 100519 bhyve               rfb                  -1  126 sleep   accept
 4470 100520 bhyve               vcpu 0               -1  128 sleep   vmidle
 4470 100521 bhyve               vcpu 1                1  124 run     -
 4470 100522 bhyve               vcpu 2                9  127 run     -
 4470 100523 bhyve               vcpu 3               -1  134 sleep   vmidle
 4470 100524 bhyve               vcpu 4                3  125 run     -
 4470 100525 bhyve               vcpu 5                4  136 run     -

VLAN Tagging and Bridging

On FreeBSD 802.1Q tagging is configured by defining a list of VLAN numbers for an interface

ifconfig_ix0="up"
vlans_ix0="80 81 82"

ifconfig_ix0_80="inet 192.168.0.6/24"
ifconfig_ix0_81="inet 192.168.1.6/24"
ifconfig_ix0_82="inet 192.168.2.6/24"
defaultrouter="192.168.0.7"

Now that VLAN interfaces are defined, create a bridge with VLAN 82 as it's first member

cloned_interfaces="bridge2"
ifconfig_bridge2="addm ix0.82 up"

A Complete Startup Script

There are a number of Bhyve frameworks, but my approach has been to construct a shell script to manage the lifetime using bhyve(8)

#!/bin/ksh

trap 'printf "$0: exit code $? on line $LINENO\n"' ERR

if [ $# -lt 1 ]; then
    echo "$0 name [-i]"
    exit 1
fi

name=$1
opt=${2:-X}
myip=$(ifconfig | awk '/inet 192.168.2./ { print $2 }')

[ -f /vm/$name.img ] || [ $opt == '-i' ] || {
    echo "$name not found, add -i to initialize"
    exit 1
}

umask 026

case $name in
    install)  # OpenBSD with two disks
        vp=5901
        if=tap1
        if [ $opt == '-i' ]; then
            install="-s 3,ahci-cd,/iso/miniroot74.img"
            bhyvectl --destroy --vm=$name
            truncate -s 20G /vm/$name.img
            truncate -s 80G /vm/$name-data.img
        fi
        ifconfig $if create
        ifconfig bridge2 addm $if
        bhyve -c 1 -m 512M -w -H \
              -s 0,hostbridge \
              $install \
              -s 4,virtio-blk,/vm/$name.img \
              -s 5,virtio-blk,/vm/$name-data.img \
              -s 6,virtio-net,$if,mac=00:0c:29:a5:19:b9 \
              -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \
              -s 30,xhci,tablet \
              -s 31,lpc \
              -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \
               $name && exec $0 $1
        ifconfig $if destroy
        ;;
    report1)  # Ubuntu or Rocky Linux with one disk
        vp=5902
        if=tap2
        if [ $opt == '-i' ]; then
            install="-s 1,nvme,/root/uefi-ipxe-chainload.img"
            bhyvectl --destroy --vm=$name
            truncate -s 50G /vm/$name.img
        fi
        ifconfig $if create
        ifconfig bridge2 addm $if
        bhyve -c 4 -m 8G -w -H \
              -s 0,hostbridge \
              $install \
              -s 4,virtio-blk,/vm/$name.img \
              -s 5,virtio-net,$if,mac=00:0c:29:f9:6d:4e \
              -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \
              -s 30,xhci,tablet \
              -s 31,lpc \
              -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \
               $name && exec $0 $1
        ifconfig $if destroy
        ;;
esac

This script can be refactored to be less repetitious, but will serve as a starting point. Key features:

OpenBSD's miniroot is deceptively easy to use for an automated network install since install.sub reads the install server and filename from /var/db/dhcpleased/$_if.

Poor Man's Service Discovery

The ARP/NDP will discover a VM that is brought up on another server, but the VNC port is attached to a particular host. One solution is to test each bhyve instance in a sequence with nc(1)

#!/bin/sh
name=$1

case $name in
    install)
        vp=5901
        ;;
    report1)
        vp=5902
        ;;
esac

for host in 192.168.2.5 192.168.2.6; do
    nc -w 1 -z $host $vp && exec vncviewer $host:$vp
done
echo "$name not running"
exit 1

Set VNC_PASSWORD to avoid a prompt.

References