Eric Radman : a Journal

Bhyve, Linux Containers and MTU

Containers run in a virtual machine can suffer from connectivity problems. It seems that something in the network stack requires an extra 50 bytes, even if a VXLAN is not used for container networking. Why is this happening?

alpine1$ curl -O http://192.168.0.2/iso/Rocky-8.10-x86_64-minimal.iso
bhyve2# tcpdump -i ix0.82 | grep -i frag
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ix0.82, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:11:03.773139 IP mykube3 > apu4d2: ICMP mykube3 unreachable - need to frag (mtu 1500), length 556
11:11:03.773786 IP mykube3 > apu4d2: ICMP mykube3 unreachable - need to frag (mtu 1500), length 556
11:11:03.774148 IP mykube3 > apu4d2: ICMP mykube3 unreachable - need to frag (mtu 1500), length 556

On most operating sytems, the MTU inclues the protocol overhead. but not Ethernet overhead.

payload  ip  icmp
   1472  20     8 = 1500
bhyve1# ping -D -s 1472 10.245.0.11
PING 10.245.0.11 (10.245.0.11): 1472 data bytes
1480 bytes from 10.245.0.11: icmp_seq=0 ttl=62 time=0.282 ms

Network Topology

This illustrates a network where the endpoints (Linux VM ↔ Linux container) are able to accept an extra 50 bytes of overhead:

 platform  mtu  host-if              mode
 --------- ---  -------------------  ------
 junos     4021 ex2300 - xe-0/1/1    802.1q
 freebsd   4000 bhyve2 - ix82        802.1q
           4000 bhyve2 - ix0.82
           4000 bhyve2 - bridge2
           4000 bhyve2 - tap15
 linux     1500   mykube3 - enp0s5
           1550   mykube3 - cni0
           1550   mykube3 - veth69626ca5
 container 1550     registry - eth0
 platform  mtu  host - if            mode
 --------- ---  -------------------  ------
 junos     4021 ex2300 - xe-0/1/0    802.1q
 freebsd   4000 bhyve1 - ix82        802.1q
           4000 bhyve1 - ix0.82
           4000 bhyve1 - bridge2
           4000 bhyve2 - tap9
 linux     1550   sfreport1 - enp0s5
 platform  mtu  host-if              mode
 --------  ---  -------------------  ------
 junos     4021 ex2300 - ge-0/0/7    802.1q
 openbsd   1500 apu4d2 - em1         802.1q
           1500 apu4d2 - vlan0

Linux Hosts

For Network Manager (Red Hat)

nmcli con mod enp0s5 mtu 1550

For Netplan (Ubuntu)

netplan set "network.ethernets.enp0s5.mtu=1550"
netplan apply

OSPF will reject advertised routes with mismatched MTU.

Containers

For Kubernetes, with the native bridge networking, adjust /etc/cni/net.d/100-crio-bridge.conflist

{
  "cniVersion": "1.0.0",
  "name": "crio",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "mtu": 1550,
      "isGateway": true,
      "hairpinMode": true,
      "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" },
            { "dst": "::/0" }
        ],
        "ranges": [
            [{ "subnet": "${ip4_net}/24" }],
            [{ "subnet": "${ip6_net}/64" }]
        ]
      }
    }
  ]
}

For Docker, adjust /etc/docker/daemon.json

{
  "bip": "${ip4_base}/24",
  "fixed-cidr": "${ip4_net}/24",
  "ip-masq": false,
  "mtu": 1550,
}