Bhyve and iPXE
In software or systems engineering, the ability to spin up virtual machines is a valuable capability. Virtual machines provide a means of testing software or configuration in on multiple platforms, and in multiple configurations.
In production, a high-performance and reliable hypervisor allows services to
be provisioned and updated one at a time. This capability avoids the added
risk and complexity of
fork-lift
upgrades.
Boot Image
iPXE is a very capable UEFI application with very obscure documentation. Specifically, it is a challenging to determine what configuration iPXE it tries to find by default—the Internet is full of incorrect answers. strings(1) resolved this question!
$ strings /usr/local/share/ipxe/snp.efi-x86_64 | egrep '.ipxe$' #!ipxe autoexec.ipxe
Bhyve does not have a built-in PXE boot support, but we can build our own
boot image which will be added asn an
nvme
device when the VM is started
#!ipxe dhcp && goto netboot || goto dhcperror :dhcperror prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot :netboot chain tftp://${next-server}/chainload/${hostname}.ipxe || prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot
The trick in this configuration is that iPXE will fetch further configuration based on
next-server
and
host-name
DHCP options.
#!/bin/sh dst=uefi-ipxe-chainload.img truncate -s 4M $dst mdconfig -a -t vnode -u 99 -f $dst gpart create -s gpt /dev/md99 gpart add -t efi /dev/md99 newfs_msdos -F12 /dev/md99p1 mount -t msdosfs -o longnames /dev/md99p1 /mnt mkdir -p /mnt/EFI/Boot cp /usr/local/share/ipxe/snp.efi-x86_64 /mnt/EFI/Boot/BootX64.efi cp autoexec.ipxe /mnt/autoexec.ipxe umount /mnt mdconfig -du md99
Process Status
In Bhyve, each virtual machine is a process, and each virtual CPU is a thread
which may be listed using
top -H
or
procstat(1)
# procstat -t 4470 PID TID COMM TDNAME CPU PRI STATE WCHAN 4470 100259 bhyve mevent -1 120 sleep kqread 4470 100510 bhyve blk-4:0-0 -1 120 sleep uwait 4470 100511 bhyve blk-4:0-1 -1 120 sleep uwait 4470 100512 bhyve blk-4:0-2 -1 120 sleep uwait 4470 100513 bhyve blk-4:0-3 -1 120 sleep uwait 4470 100514 bhyve blk-4:0-4 -1 120 sleep uwait 4470 100515 bhyve blk-4:0-5 -1 120 sleep uwait 4470 100516 bhyve blk-4:0-6 -1 120 sleep uwait 4470 100517 bhyve blk-4:0-7 -1 120 sleep uwait 4470 100518 bhyve vtnet-5:0 tx -1 120 sleep uwait 4470 100519 bhyve rfb -1 126 sleep accept 4470 100520 bhyve vcpu 0 -1 128 sleep vmidle 4470 100521 bhyve vcpu 1 1 124 run - 4470 100522 bhyve vcpu 2 9 127 run - 4470 100523 bhyve vcpu 3 -1 134 sleep vmidle 4470 100524 bhyve vcpu 4 3 125 run - 4470 100525 bhyve vcpu 5 4 136 run -
VLAN Tagging and Bridging
On FreeBSD 802.1Q tagging is configured by defining a list of VLAN numbers for an interface
ifconfig_ix0="up" vlans_ix0="80 81 82" ifconfig_ix0_80="inet 192.168.0.6/24" ifconfig_ix0_81="inet 192.168.1.6/24" ifconfig_ix0_82="inet 192.168.2.6/24" defaultrouter="192.168.0.7"
Now that VLAN interfaces are defined, create a bridge with VLAN 82 as it's first member
cloned_interfaces="bridge2" ifconfig_bridge2="addm ix0.82 up"
A Complete Startup Script
There are a number of Bhyve frameworks, but my approach has been to construct a shell script to manage the lifetime using bhyve(8)
#!/bin/ksh trap 'printf "$0: exit code $? on line $LINENO\n"' ERR if [ $# -lt 1 ]; then echo "$0 name [-i]" exit 1 fi name=$1 opt=${2:-X} myip=$(ifconfig | awk '/inet 192.168.2./ { print $2 }') [ -f /vm/$name.img ] || [ $opt == '-i' ] || { echo "$name not found, add -i to initialize" exit 1 } umask 026 case $name in install) # OpenBSD with two disks vp=5901 if=tap1 if [ $opt == '-i' ]; then install="-s 3,ahci-cd,/iso/miniroot74.img" bhyvectl --destroy --vm=$name truncate -s 20G /vm/$name.img truncate -s 80G /vm/$name-data.img fi ifconfig $if create ifconfig bridge2 addm $if bhyve -c 1 -m 512M -w -H \ -s 0,hostbridge \ $install \ -s 4,virtio-blk,/vm/$name.img \ -s 5,virtio-blk,/vm/$name-data.img \ -s 6,virtio-net,$if,mac=00:0c:29:a5:19:b9 \ -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \ -s 30,xhci,tablet \ -s 31,lpc \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \ $name && exec $0 $1 ifconfig $if destroy ;; report1) # Ubuntu or Rocky Linux with one disk vp=5902 if=tap2 if [ $opt == '-i' ]; then install="-s 1,nvme,/root/uefi-ipxe-chainload.img" bhyvectl --destroy --vm=$name truncate -s 50G /vm/$name.img fi ifconfig $if create ifconfig bridge2 addm $if bhyve -c 4 -m 8G -w -H \ -s 0,hostbridge \ $install \ -s 4,virtio-blk,/vm/$name.img \ -s 5,virtio-net,$if,mac=00:0c:29:f9:6d:4e \ -s 29,fbuf,tcp=$myip:$vp,w=800,h=600,password=$mypw \ -s 30,xhci,tablet \ -s 31,lpc \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd \ $name && exec $0 $1 ifconfig $if destroy ;; esac
This script can be refactored to be less repetitious, but will serve as a starting point. Key features:
-
If
-iis specified use the iPXE image as the first disk - Assign a VNC port for installation or debugging
-
Automatically restart using
exec $0if the VM exits without an error code -
Add/remove
tapNto the network bridge servicing VMs
OpenBSD's miniroot is easy to use for an automated network install since
install.sub
reads the install server and filename from
/var/db/dhcpleased/$_if.
Poor Man's Service Discovery
The ARP/NDP will discover a VM that is brought up on another server, but the VNC port is listening to the machine running bhyve. One way to find the VNC console is to test each bhyve instance in a sequence with nc(1)
#!/bin/sh name=$1 case $name in install) vp=5901 ;; report1) vp=5902 ;; esac for host in 192.168.2.5 192.168.2.6; do nc -w 1 -z $host $vp && exec vncviewer $host:$vp done echo "$name not running" exit 1
Set
VNC_PASSWORD
to avoid a prompt.