Linux-HA and Pacemaker have supported managing Xen DomUs as resources for a long time; this allows the cluster to start, stop, monitor, and migrate the guests, providing high-availability through fail-over for arbitrary virtualized services, even including some monitoring hooks into the guest (see XenResource).
Running a cluster within the virtual guests however is desirable as well:
gain access to clustered filesystems such as OCFS2 within the DomUs,
(This is straight-forward, but requires a very clear terminology to not be confusing. DomU cluster refers to the cluster running within the virtual machines. Dom0 cluster refers to the cluster running at the physical layer.)
However, running a Linux-HA or Pacemaker cluster within the DomU faces some special challenges.
While this has been supported for testing for a long time, greatly reducing the requirements on hardware for non-production work, productive settings require
It is the combination of these points which cause some non-obvious issues. In case the host for a guest becomes unreachable, the DomU cluster can no longer achieve successful fencing; this split-brain scenario needs escalation to the Dom0 layer.
The solution is to integrate the two layers of clusters, in particular with regard to to STONITH.
If you use the Dom0 cluster to stop and start DomUs, the DomU nodes will cleanly sign-out of the DomU cluster and not trigger a fencing operation.
Only this integration delivers the maximum reliability.
Special requirement: The resource id must match the hostname (uname) of the DomU within the DomU cluster!
You can set the meta_attribute allow-migrate as you prefer.
For fastest recovery, set shutdown_timeout to 0 on the Xen resource. This forces an immediate destroy; as this is an error escalation, this is likely what you want.
Ensure that the DomU nodes are spread over several physical nodes, otherwise you will have no real redundancy.
Use the external/xen0-ha STONITH plugin:
Set the dom0_cluster_ip to the IP address configured in the Dom0 cluster.
Set the hostlist to all nodes within the cluster. Again, these hostnames/unames must match the ids of the XenResource objects configured in Dom0!
Make sure the clusters do not use the same port numbers and/or mcast address. Otherwise, your logs will be flooded with authentication errors, or worse, if autojoin is enabled, you will have the two layers join into a big cluster and this will utterly and completely fail.
Several DomU clusters sharing the same Dom0 cluster are not a problem per se. However, in case one of the guests becomes fatally stuck and no longer responds to xm destroy, the other guests on the physical node will also be affected and moved elsewhere. Since this indicates a bug in the hypervisor, this is beneficial, but it means the clusters are not completely independent.
In case of doubt, please ask on the linux-ha list (or the Linux support vendor of your choice) whether your configuration is fine. The intricacies are not always obvious.
Document if, and how, mixed P/V environments are supportable.