Bhyve, Linux Containers and MTU
Containers run in a virtual machine can suffer from connectivity problems. It seems that something in the network stack requires an extra 50 bytes, even if a VXLAN is not used for container networking. Why is this happening?
alpine1$ curl -O http://192.168.0.2/iso/Rocky-8.10-x86_64-minimal.iso
bhyve2# tcpdump -i ix0.82 | grep -i frag tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on ix0.82, link-type EN10MB (Ethernet), snapshot length 262144 bytes 11:11:03.773139 IP mykube3 > apu4d2: ICMP mykube3 unreachable - need to frag (mtu 1500), length 556 11:11:03.773786 IP mykube3 > apu4d2: ICMP mykube3 unreachable - need to frag (mtu 1500), length 556 11:11:03.774148 IP mykube3 > apu4d2: ICMP mykube3 unreachable - need to frag (mtu 1500), length 556
On most operating sytems, the MTU inclues the protocol overhead. but not Ethernet overhead.
payload ip icmp 1472 20 8 = 1500
bhyve1# ping -D -s 1472 10.245.0.11 PING 10.245.0.11 (10.245.0.11): 1472 data bytes 1480 bytes from 10.245.0.11: icmp_seq=0 ttl=62 time=0.282 ms
Network Topology
This illustrates a network where the endpoints (Linux VM ↔ Linux container) are able to accept an extra 50 bytes of overhead:
platform mtu host-if mode --------- --- ------------------- ------ junos 4021 ex2300 - xe-0/1/1 802.1q freebsd 4000 bhyve2 - ix82 802.1q 4000 bhyve2 - ix0.82 4000 bhyve2 - bridge2 4000 bhyve2 - tap15 linux 1500 mykube3 - enp0s5 1550 mykube3 - cni0 1550 mykube3 - veth69626ca5 container 1550 registry - eth0
platform mtu host - if mode --------- --- ------------------- ------ junos 4021 ex2300 - xe-0/1/0 802.1q freebsd 4000 bhyve1 - ix82 802.1q 4000 bhyve1 - ix0.82 4000 bhyve1 - bridge2 4000 bhyve2 - tap9 linux 1550 sfreport1 - enp0s5
platform mtu host-if mode -------- --- ------------------- ------ junos 4021 ex2300 - ge-0/0/7 802.1q openbsd 1500 apu4d2 - em1 802.1q 1500 apu4d2 - vlan0
Linux Hosts
For Network Manager (Red Hat)
nmcli con mod enp0s5 mtu 1550
For Netplan (Ubuntu)
netplan set "network.ethernets.enp0s5.mtu=1550" netplan apply
OSPF will reject advertised routes with mismatched MTU.
Containers
For Kubernetes, with the native bridge networking, adjust
/etc/cni/net.d/100-crio-bridge.conflist
{ "cniVersion": "1.0.0", "name": "crio", "plugins": [ { "type": "bridge", "bridge": "cni0", "mtu": 1550, "isGateway": true, "hairpinMode": true, "ipam": { "type": "host-local", "routes": [ { "dst": "0.0.0.0/0" }, { "dst": "::/0" } ], "ranges": [ [{ "subnet": "${ip4_net}/24" }], [{ "subnet": "${ip6_net}/64" }] ] } } ] }
For Docker, adjust
/etc/docker/daemon.json
{ "bip": "${ip4_base}/24", "fixed-cidr": "${ip4_net}/24", "ip-masq": false, "mtu": 1550, }