Eric Radman : a Journal

Bhyve and iPXE

In software or systems engineering, the ability to spin up virtual machines is a valuable capability. Virtual machines provide a means of testing software or configuration in on multiple platforms, and in multiple configurations.

In production, a high-performance and reliable hypervisor allows services to be provisioned and updated one at a time. This capability avoids the added risk and complexity of fork-lift upgrades.

Boot Image

iPXE is a very capable UEFI application with very obscure documentation. Specifically, it is a challenging to determine what configuration iPXE it tries to find by default—the Internet is full of incorrect answers. strings(1) resolved this question!

$ strings /usr/local/share/ipxe/snp.efi-x86_64 | egrep '.ipxe$'
#!ipxe
autoexec.ipxe

Bhyve does not have a built-in PXE boot support, but we can build our own boot image which will be added asn an nvme device when the VM is started

#!ipxe
dhcp && goto netboot || goto dhcperror

:dhcperror
prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

:netboot
chain tftp://${next-server}/chainload/${hostname}.ipxe ||
prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

The trick in this configuration is that iPXE will fetch further configuration based on next-server and host-name DHCP options.

#!/bin/sh

dst=uefi-ipxe-chainload.img

truncate -s 4M $dst
mdconfig -a -t vnode -u 99 -f $dst
gpart create -s gpt /dev/md99
gpart add -t efi /dev/md99
newfs_msdos -F12 /dev/md99p1
mount -t msdosfs -o longnames /dev/md99p1 /mnt

mkdir -p /mnt/EFI/Boot
cp /usr/local/share/ipxe/snp.efi-x86_64 /mnt/EFI/Boot/BootX64.efi
cp autoexec.ipxe /mnt/autoexec.ipxe

umount /mnt
mdconfig -du md99

Process Status

In Bhyve, each virtual machine is a process, and each virtual CPU is a thread which may be listed using top -H or procstat(1)

# procstat -t 4470
  PID    TID COMM                TDNAME              CPU  PRI STATE   WCHAN
 4470 100259 bhyve               mevent               -1  120 sleep   kqread
 4470 100510 bhyve               blk-4:0-0            -1  120 sleep   uwait
 4470 100511 bhyve               blk-4:0-1            -1  120 sleep   uwait
 4470 100512 bhyve               blk-4:0-2            -1  120 sleep   uwait
 4470 100513 bhyve               blk-4:0-3            -1  120 sleep   uwait
 4470 100514 bhyve               blk-4:0-4            -1  120 sleep   uwait
 4470 100515 bhyve               blk-4:0-5            -1  120 sleep   uwait
 4470 100516 bhyve               blk-4:0-6            -1  120 sleep   uwait
 4470 100517 bhyve               blk-4:0-7            -1  120 sleep   uwait
 4470 100518 bhyve               vtnet-5:0 tx         -1  120 sleep   uwait
 4470 100519 bhyve               rfb                  -1  126 sleep   accept
 4470 100520 bhyve               vcpu 0               -1  128 sleep   vmidle
 4470 100521 bhyve               vcpu 1                1  124 run     -
 4470 100522 bhyve               vcpu 2                9  127 run     -
 4470 100523 bhyve               vcpu 3               -1  134 sleep   vmidle
 4470 100524 bhyve               vcpu 4                3  125 run     -
 4470 100525 bhyve               vcpu 5                4  136 run     -

VLAN Tagging and Bridging

On FreeBSD 802.1Q tagging is configured by defining a list of VLAN numbers for an interface

ifconfig_ix0="up"
vlans_ix0="80 81 82"

ifconfig_ix0_80="inet 192.168.0.6/24"
ifconfig_ix0_81="inet 192.168.1.6/24"
ifconfig_ix0_82="inet 192.168.2.6/24"
defaultrouter="192.168.0.7"

Now that VLAN interfaces are defined, create a bridge with VLAN 82 as it's first member

cloned_interfaces="bridge2"
ifconfig_bridge2="addm ix0.82 up"

A Complete Startup Script

There are a number of Bhyve frameworks, but my approach has been to construct a shell script to manage the lifetime using bhyve(8)

#!/bin/ksh

trap 'printf "$0: exit code $? on line $LINENO\n"' ERR

if [ $# -lt 1 ]; then
    echo "$0 name [-i]"
    exit 1
fi

name=$1
opt=${2:-X}
myip=$(ifconfig | awk '/inet 192.168.2./ { print $2 }')

[ -f /vm/$name.img ] || [ $opt == '-i' ] || {
    echo "$name not found, add -i to initialize"
    exit 1
}

umask 026

case $name in
    install)  # OpenBSD with two disks
        vp=5901
        if=tap1
        if [ $opt == '-i' ]; then
            install="-s 3,ahci-cd,/iso/miniroot74.img"
            bhyvectl --destroy --vm=$name
            truncate -s 20G /vm/$name.img
            truncate -s 80G /vm/$name-data.img
        fi
        ifconfig $if create
        ifconfig bridge2 addm $if
        bhyve -c 1 -m 512M -w -H \
              -s 0,hostbridge \
              $install \
              -s 4,virtio-blk,/vm/$name.img \
              -s 5,virtio-blk,/vm/$name-data.img \
              -s 6,virtio-net,$if,mac=00:0c:29:a5:19:b9 \
              -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \
              -s 30,xhci,tablet \
              -s 31,lpc \
              -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \
               $name && exec $0 $1
        ifconfig $if destroy
        ;;
    report1)  # Ubuntu or Rocky Linux with one disk
        vp=5902
        if=tap2
        if [ $opt == '-i' ]; then
            install="-s 1,nvme,/root/uefi-ipxe-chainload.img"
            bhyvectl --destroy --vm=$name
            truncate -s 50G /vm/$name.img
        fi
        ifconfig $if create
        ifconfig bridge2 addm $if
        bhyve -c 4 -m 8G -w -H \
              -s 0,hostbridge \
              $install \
              -s 4,virtio-blk,/vm/$name.img \
              -s 5,virtio-net,$if,mac=00:0c:29:f9:6d:4e \
              -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \
              -s 30,xhci,tablet \
              -s 31,lpc \
              -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \
               $name && exec $0 $1
        ifconfig $if destroy
        ;;
esac

This script can be refactored to be less repetitious, but will serve as a starting point. Key features:

OpenBSD's miniroot is easy to use for an automated network install since install.sub reads the install server and filename from /var/db/dhcpleased/$_if.

Poor Man's Service Discovery

The ARP/NDP will discover a VM that is brought up on another server, but the VNC port is listening to the machine running bhyve. One way to find the VNC console is to test each bhyve instance in a sequence with nc(1)

#!/bin/sh
name=$1

case $name in
    install)
        vp=5901
        ;;
    report1)
        vp=5902
        ;;
esac

for host in 192.168.2.5 192.168.2.6; do
    nc -w 1 -z $host $vp && exec vncviewer $host:$vp
done
echo "$name not running"
exit 1

Set VNC_PASSWORD to avoid a prompt.

References