Strange Issues With Microsoft Clustering and ESXi

Datetime:2017-04-20 05:45:33         Topic: Cluster Analysis          Share        Original >>
Here to See The Original Article!!!

I have some legacy applications that require Microsoft Clustering which are running on ESXi 6.0.  Using Microsoft Clustering on top of VMware does not give you many benefits.  Things like HA and moving workloads across nodes is already available using virtualization.  What clustering does do is create more places for things to break and give you downtime.  Really the only benefit I see with clustering in a virtualized environment is the ability to restart a server for system updates.

RDM’s are required for using Microsoft Clustering.  RDM “Raw Device Map” gives the VM control of the LUN such as it was directly connected to it. To set this up you need to add a second SCSI controller and set it to physical mode.  Each disk must then share the same SCSI controller settings for every VM in the cluster. The negative side to doing this is that you lose such features as snapshot and vmotion.  When using RDM’s with physical mode you should treat those VM’s as if they were physical hosts.

The problem occurred when one of the clustered nodes was rebooted.  The node never came back online, and when checking the console it looked like the Windows OS was gone.  Powered off the VM and removed the mapped RDM’s.  When powering on the VM Windows booted up fine.  I Found that very strange so powered it off again and added the drives back.  That is when  I got the error invalid device backing. VMware KB references the issue, and it basically says there is an issue with inconsistent LUN’s The only problem was I did have have consistent LUN’s.  I put in a ticket with GSS, and the first level support was not able to help.  They had to get a storage expert to help out. He quickly found this issue which was the LUN ID had changed. I am not sure how that occurred, but it was not anything I could change  When adding the drives in the VM’s the config it makes a mapping from the VM to the LUN.  When the LUN ID changed the mapping did not.  The only fix was to remove the RDM’s from all VM’s in that cluster and then add them back.








New