6. High Availability - Non-Responsive Host
A host is deemed non-responsive when the Red Hat Enterprise Virtualization Manager cannot communicate with the Red Hat Enterprise Virtualization agent on the host. This can be either due to a networking issue, or failure on the host side (kernel panic, power failure and such) which stops all communication with the host.
When a host is non-responsive, it will be fenced to ensure that virtual machines are allowed to restart on other hosts in the cluster while avoiding "split brain" — a situation in which communication with the host is lost while the virtual machines are still partially running. This scenario is simulated in the following section, where you will disconnect the host's management network while the storage connection remains functional.
At this stage, the Pacific
host is non-operational as its storage connection was cut in the previous section.
Restart it for the next demonstration. On the Tree pane, select the Pacific
host. Click the Power Management button and select Restart. Because you have fenced the host, it automatically brings the storage
and eth1
networks up again, and allows the host to run as normal.
When the host's status changes to Up, migrate several machines onto it. This example uses RHEL6RioGrande
, RHEL6Thames
(both highly available machines) and RHEL6Erie
. As you have disabled cluster policy at the beginning of this lab, these virtual machines will not auto-migrate as soon as the host is back up. Therefore, they need to be manually migrated to the Pacific
host.
To demonstrate high availability when host connection is disrupted
On the Tree pane, click Hosts. On the Hosts tab, select the Pacific
host, and click the Network Interfaces subtab on the details pane. Check the physical interface name of the rhevm
network — in this example it is the eth0
network.
As before, connect to the
Pacific
host via SSH. Disable the management network by running:
# ifdown rhevm
You have now shut down the network connecting the
Pacific
host to the Red Hat Enterprise Virtualization Manager. The next time that the Manager attempts to transmit signals to the host, it triggers the automatic fencing operation.
From the Tree pane, click VMs to display the Virtual Machines tab. The highly available virtual machines, RHEL6RioGrande
and RHEL6Thames
, have restarted on the Atlantic
host. Conversely, RHEL6Erie
did not restart because it was not configured to be highly available.
Finally, go to the Tree pane and click Hosts to examine the status of the hosts. After a short period, the Pacific
host will be rebooted, assuming that power management was successfully configured on this host.
You have just run a demonstration where a non-responsive host was automatically fenced and rebooted. As you had simulated a non-persistent network failure, the host will recover from the fault following its reboot. In the interim period while it is being restarted, the highly available virtual machines originally running on it are restarted on another available host in the cluster. Conversely, non-highly available virtual machines need to be manually restarted.