Following an issue with the storage used by our Hyper-V cluster, one node in our five node cluster became partially unresponsive. The virtual machines (VMs) running on the unresponsive node were automatically moved to other cluster nodes and service was resumed withing a couple of minutes. At first everything appeared to be fine, but within a few minutes our monitoring system started to report connectivity issues to the VMs that had failed over.
I RDP’d onto one of the VMs that was having connectivity issues, but found the connection kept dropping out, so I connected to the console through System Center Virtual Machine Manager (SCVMM). I found I was unable to ping any server on the physical network. I took a look at the event log on one of the virtual hosts and saw the error below:
Port ‘BF392932-9AE4-453A-8E13-26671BB556D9′ was prevented from using MAC address ’00-14-22-18-7F-DC’ because it is pinned to port ‘SCVMM-C26227E3-D6AB-4818-B8BF-4CCF923C’.
The error message implied another VM was using the MAC of the VM that was having connectivity issues. As the VM had a dynamic MAC that was managed by SCVMM I knew that couldn’t be the case. I decided to reboot the unresponsive cluster node. After waiting 30 minutes for the node to shutdown I killed the power via a DRAC. As soon as I killed the power to the node the MAC address errors in the event log disappeared and all the VMs resumed normal connectivity. I believe the cluster node that became unresponsive was keeping some kind of lock on the MAC addresses of the VMs that were running on the node when it became unresponsive. Killing the power to the node freed the locks enabling connectivity to resume.