Bhyve and iPXE
In software or systems engineering, the ability to spin up virtual machines is a valuable capability. Virtual machines provide a means of testing software or configuration in on multiple platforms, and in multiple configurations.
In production, a high-performance and reliable hypervisor allows services to
be provisioned and updated one at a time. This capability avoids the added
risk and complexity of
fork-lift
upgrades.
Bhyve is somewhat similar QEMU-KVM without the emulation required to boot SeaBIOS. Native UEFI support plus iPXE provides some very flexible means of bootstrapping VMs.
Boot Image
iPXE is a very capable UEFI application with very obscure documentation. Specifically, it is a challenging to determine what configuration iPXE it tries to find by default—the Internet is full of incorrect answers. strings(1) resolved this question!
$ strings /usr/local/share/ipxe/snp.efi-x86_64 | egrep '.ipxe$' #!ipxe autoexec.ipxe
Bhyve does not have a built-in PXE boot support, but we can build our own
boot image which will be added asn an
nvme
device when the VM is started
#!ipxe dhcp && goto netboot || goto dhcperror :dhcperror prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot :netboot chain tftp://${next-server}/chainload/${hostname}.ipxe || prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot
The trick in this configuration is that iPXE will fetch further configuration based on
next-server
and
host-name
DHCP options.
#!/bin/sh dst=uefi-ipxe-chainload.img truncate -s 4M $dst mdconfig -a -t vnode -u 99 -f $dst gpart create -s gpt /dev/md99 gpart add -t efi /dev/md99 newfs_msdos -F12 /dev/md99p1 mount -t msdosfs -o longnames /dev/md99p1 /mnt mkdir -p /mnt/EFI/Boot cp /usr/local/share/ipxe/snp.efi-x86_64 /mnt/EFI/Boot/BootX64.efi cp autoexec.ipxe /mnt/autoexec.ipxe umount /mnt mdconfig -du md99
Process Status
In Bhyve, each virtual machine is a process, and each virtual CPU is a thread
which may be listed using
top -H
or
procstat(1)
# procstat -t 4470 PID TID COMM TDNAME CPU PRI STATE WCHAN 4470 100259 bhyve mevent -1 120 sleep kqread 4470 100510 bhyve blk-4:0-0 -1 120 sleep uwait 4470 100511 bhyve blk-4:0-1 -1 120 sleep uwait 4470 100512 bhyve blk-4:0-2 -1 120 sleep uwait 4470 100513 bhyve blk-4:0-3 -1 120 sleep uwait 4470 100514 bhyve blk-4:0-4 -1 120 sleep uwait 4470 100515 bhyve blk-4:0-5 -1 120 sleep uwait 4470 100516 bhyve blk-4:0-6 -1 120 sleep uwait 4470 100517 bhyve blk-4:0-7 -1 120 sleep uwait 4470 100518 bhyve vtnet-5:0 tx -1 120 sleep uwait 4470 100519 bhyve rfb -1 126 sleep accept 4470 100520 bhyve vcpu 0 -1 128 sleep vmidle 4470 100521 bhyve vcpu 1 1 124 run - 4470 100522 bhyve vcpu 2 9 127 run - 4470 100523 bhyve vcpu 3 -1 134 sleep vmidle 4470 100524 bhyve vcpu 4 3 125 run - 4470 100525 bhyve vcpu 5 4 136 run -
VLAN Tagging and Bridging
On FreeBSD 802.1Q tagging is configured by defining a list of VLAN numbers for an interface
ifconfig_ix0="up" vlans_ix0="80 81 82" ifconfig_ix0_80="inet 192.168.0.6/24" ifconfig_ix0_81="inet 192.168.1.6/24" ifconfig_ix0_82="inet 192.168.2.6/24" defaultrouter="192.168.0.7"
Now that VLAN interfaces are defined, create a bridge with VLAN 82 as it's first member
cloned_interfaces="bridge2" ifconfig_bridge2="addm ix0.82 up"
A Complete Startup Script
There are a number of Bhyve frameworks, but my approach has been to construct a shell script to manage the lifetime using bhyve(8)
#!/bin/ksh trap 'printf "$0: exit code $? on line $LINENO\n"' ERR if [ $# -lt 1 ]; then echo "$0 name [-i]" exit 1 fi name=$1 opt=${2:-X} myip=$(ifconfig | awk '/inet 192.168.2./ { print $2 }') [ -f /vm/$name.img ] || [ $opt == '-i' ] || { echo "$name not found, add -i to initialize" exit 1 } umask 026 case $name in install) # OpenBSD with two disks vp=5901 if=tap1 if [ $opt == '-i' ]; then install="-s 3,ahci-cd,/iso/miniroot74.img" bhyvectl --destroy --vm=$name truncate -s 20G /vm/$name.img truncate -s 80G /vm/$name-data.img fi ifconfig $if create ifconfig bridge2 addm $if bhyve -c 1 -m 512M -w -H \ -s 0,hostbridge \ $install \ -s 4,virtio-blk,/vm/$name.img \ -s 5,virtio-blk,/vm/$name-data.img \ -s 6,virtio-net,$if,mac=00:0c:29:a5:19:b9 \ -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \ -s 30,xhci,tablet \ -s 31,lpc \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \ $name && exec $0 $1 ifconfig $if destroy ;; report1) # Ubuntu or Rocky Linux with one disk vp=5902 if=tap2 if [ $opt == '-i' ]; then install="-s 1,nvme,/root/uefi-ipxe-chainload.img" bhyvectl --destroy --vm=$name truncate -s 50G /vm/$name.img fi ifconfig $if create ifconfig bridge2 addm $if bhyve -c 4 -m 8G -w -H \ -s 0,hostbridge \ $install \ -s 4,virtio-blk,/vm/$name.img \ -s 5,virtio-net,$if,mac=00:0c:29:f9:6d:4e \ -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \ -s 30,xhci,tablet \ -s 31,lpc \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \ $name && exec $0 $1 ifconfig $if destroy ;; esac
This script can be refactored to be less repetitious, but will serve as a starting point. Key features:
-
If
-i
is specified use the iPXE image as the first disk - Assign a VNC port for installation or debugging
-
Automatically restart using
exec $0
if the VM exits without an error code -
Add/remove
tapN
to the network bridge servicing VMs
OpenBSD's miniroot is deceptively easy to use for an automated network install
since
install.sub
reads the install server and filename from
/var/db/dhcpleased/$_if
.
Poor Man's Service Discovery
The ARP/NDP will discover a VM that is brought up on another server, but the VNC port is attached to a particular host. One solution is to test each bhyve instance in a sequence with nc(1)
#!/bin/sh name=$1 case $name in install) vp=5901 ;; report1) vp=5902 ;; esac for host in 192.168.2.5 192.168.2.6; do nc -w 1 -z $host $vp && exec vncviewer $host:$vp done echo "$name not running" exit 1
Set
VNC_PASSWORD
to avoid a prompt.