Recently a client had a problem with some VM’s on its infrastructure. In more detail it was detected that some packets were missing/not transmitting correctly to some virtual servers.
The investigation of the vendor (Cisco) showed that the problem was due to a hard limitation of MAC addresses on the switch. It seemed that a limitation of 70.000 MAC addresses created this problem. But how so many addresses have been created on VMware infrastructure?
In the infrastructure two new chassis with Flex nodes have been added. Those chassis contain around 20 servers each that were included in the virtual Distributed switch. In the vDS existed around 210 VLAN’s. Everytime a new chassis was active on the switch around 3000 new MAC addresses created. This number is approximately the number of VLANs x host number = 20 x 210 = 4200
The problem occurred due to Distributed switch health check mechanism. As VMware states:
vSphere Distributed Switch health check helps you identify and troubleshoot configuration problems with the vSphere Distributed Switch (VDS), and mismatched configurations between the VDS and your environment’s physical network. By default, health check is turned off. You can enable health check to identify and resolve network problems you might be experiencing. Depending on the options that you select, vSphere Distributed Switch health check can generate a significant number of MAC addresses for testing teaming policy, MTU size, and VLAN configuration. These MAC addresses result in extra network traffic, which can affect network performance.
Use health check to troubleshoot network problems, and then disable it after you identify and resolve the problem. After you disable vSphere Distributed Switch health check, the generated MAC addresses age out of your physical network environment according to your network policy. For more information, see Knowledge Base article KB 2034795.
As a result only enable virtual distributed switch health check mechanism for troubleshooting and disable on production environments as it may cause problems on your VM’s.