Linux-HA Logo

Split-Site R2 configurations for Business Continuity

It is common for businesses to configure backup sites for their business continuity plans. Then when one site goes down, the other site can take over the workload from the site which has gone down.

When you configure a Heartbeat[1] cluster in this way with some nodes in one site and some in another site, we call it a split-site[2] or stretch configuration.

Contents

  1. Split-Site R2 configurations for Business Continuity
    1. The Problem
    2. Different Classes of Split-Site Clusters
    3. Quorum and Tie-Breaker Architecture in the R2 CCM
      1. Quorum Plugin
      2. Tiebreaker Plugin
      3. Possible Alternative Quorum Plugins
      4. Possible Alternative Tiebreaker Plugins
    4. Possible Generalizations of quorum and tiebreaker plugin architecture
  2. Proposed Initial Split-Site Implementation
  3. Optional Steps to possibly follow the Initial Implementation

The Problem

When one has a cluster which is in a single location, it is relatively easy to create highly reliable communications between the nodes in a cluster. Combined with fencing[3] techniques like STONITH[4], it is straightforward to guarantee that it is nearly impossible for a SplitBrain[5] condition to arise.

However, in a split-site configuration, it is virtually impossible to guarantee reliable communications between the cluster nodes, and fencing[3] techniques often are unusable in such a situation - because they also rely on reliable communications.

So in a split-site configuration, one is left with the uncomfortable situation that SplitBrain[5] conditions can routinely arise, and there is no fencing[3] technique available to render them harmless. Unless properly compensated for by other methods, BadThingsWillHappen[6] to a split-site cluster.

One of the key issues to be considered in implementing SplitSite clusters is the replication of state over distance. This is an interesting and difficult problem, but is outside the scope of the Heartbeat[1] discussion.

This will be handled by other software or hardware components. For example, DRBD[7] or HADR[8] could be used to replicate data by software or IBM's PPRC[9] (is that name right?) product to replicate it at the disk hardware level. What Heartbeat[1] will do is manage these replication services.

Different Classes of Split-Site Clusters

There are several possible variants of this problem which each lead to their own unique issues.

  1. 2-node split site clusters
  2. n-node split site clusters with servers split evenly across two sites

  3. n-node split site clusters with servers split unevenly across the two sites

  4. Split-site clusters with an odd number of sites
  5. Split-site clusters with an even number of sites - at least four.

We would like to handle at least cases (a) and (b) above. Handling case (c) well would be a bonus, but isn't completely essential (case 'c' above requires changes to the structure of the hostcache file[10]).

Quorum and Tie-Breaker Architecture in the R2 CCM

In the CCM, the quorum[11] process is broken up into two different pieces, the quorum method and the tie-breaker method. Both are designed as plugins, so that a variety of different methods can be designed, and put together into different solutions for different kinds of configurations.

Quorum Plugin

It is the job of the quorum plugin to decide if the cluster has quorum or not. When invoked, the quorum plugin can return any of the following possible answers:

As of this writing (2.0.5), we have implemented only one type of quorum plugin - using the classic majority vote scheme. When one subcluster has an absolute majority (> INT(n/2 nodes)), then the plugin returns HAVEQUORUM. When the subcluster has exactly half of the nodes in the cluster, it returns TIEQUORUM. When the subcluster is less than n/2 nodes, it returns "NOQUORUM".

Tiebreaker Plugin

When the Quorum plugin returns TIEQUORUM, then the tiebreaker[12] plugin is called. It is the job of this plugin to use some method to break the tie so that it is virtually impossible for both nodes to think they have broken the tie.

There are is only one tiebreaker plugin currently available - twonode. The twonode tiebreaker breaks the tie if it is called in a two node cluster, and does not break the tie if called in a larger cluster. This is consistent with the behavior of the R1 cluster manager.

Possible Alternative Quorum Plugins

Many different types of quorum plugins are possible. The only constraint is that the combination of quorum plugin and tiebreaker plugin must be highly unlikely to grant quorum to both sides at once. In the absence of fencing[3] mechanisms, it is also necessary to guarantee that there is sufficient time for resources to be stopped before quorum is moved from one subcluster to another.

Here are are a few possible types of quorum plugins that come to mind:

  1. Return HAVEQUORUM when a human being says you have quorum, and NOQUORUM otherwise.
  2. Return HAVEQUORUM when a subcluster has an absolute majority of nodes in an absolute majority of sites

    (needs hostcache file[10] changes).

  3. Return HAVEQUORUM when a subcluster has an absolute majority of a score which is based on a weighting of servers - so that not every server has the same weight.

    (needs hostcache file[10] changes).

  4. Return HAVEQUORUM when a subcluster has an absolute majority of a score based on a weighted set of sites - so that not every site has the same weight. Receiving the vote for a site requires that you have an absolute majority of weighted votes from that site.

    (needs hostcache file[10] changes).

  5. ALWAYS return TIEQUORUM (deferring completely to the tiebreaker plugin)

Note: human intervention may be an attractive alternative to any of these methods. The R2 CRM has this basically built in - because you can tell a subcluster to ignore quorum - which has the same effect as saying "you have quorum unconditionally"

Possible Alternative Tiebreaker Plugins

  1. return HAVEQUORUM when all (or a clear majority) of a set of designated ping nodes is accessible by the subcluster (not well-suited for split-site arrangements - but is a good adjunct to a reliable STONITH method).
  2. Perform a disk reserve operation (not typically suitable for split-site clusters)
  3. Connect to a tiebreaker server[13] which guarantees that it will never break the tie (grant quorum) to one more than one server at a time. This method is somewhat similar to the disk reserve operation, but is software-based, and is well-suited to a split-site configuration.

  4. Human tiebreaker method. Note that this this is not precisely simulated by the "ignore" option on quorum in the CRM. This is because in this case, the human is only consulted if the quorum plugin returns TIEQUORUM.

Note that with this arrangement, it is possible to lose or gain quorum without any change in membership. This is not yet supported by the R2 CCM (but it needs to be).

Possible Generalizations of quorum and tiebreaker plugin architecture

It might be more general to revert to only one kind of plugin - the quorum plugin. And, then one could configure an ordered set of m plugins. The result of a particular plugin would only be taken into account when all previous plugins (if any) returned QUORUMTIE.

In the end, if one of the plugins eventually returned HAVEQUORUM, when all previous plugins had returned QUORUMTIE,then quorum would be granted.

Another (maybe simpler) way of saying this is:

Note that many combinations of plugins make little or no sense.

(don't know if this is clear, or a really good idea ;-))

zhenh: My consider, it may work like this:

let's consider that we have a local-quorum plugin(majority), a local tie-breaker(twonodes) and a global quorum (3rd quorum).

  1. if majority returns NOQUORUM, then the subcluster losts quorum.
  2. if majority returns HAVEQUORUM or TIEQUORUM, twonodes tie-breaker will be called.
  3. if HAVEQUORUM is passed to twonodes tie-breaker, twonodes will return HAVEQUORUM.
  4. if TIEQUORUM is passed to twonodes tie-breaker, twonodes will return HAVEQUORUM or NOQUORUM.
  5. if twonodes returns HAVEQUORUM, the global quorum 3rd quorum will be called.
  6. if the global quorum 3rd quorum returns HAVEQUORUM, and no plugin left, the subcluster will get the quorum.

    (Twonode should never be used in a split-site situation. In fact, if we add site designations[15] to nodes in the cluster, it should automatically disable itself).

zhenh[16]'s understanding:

  1. the split-site cluster includes one or more sites.
  2. a site means a set of nodes in one subnet
  3. a subcluster means a set of nodes which can connected each other in one site
  4. quorum is required to provide service.
  5. there are two levels of the quorum, local quorum(or inside a site, or in one subnet) and global quorum (inter-site, cross site or whole cluster).
  6. local quorum means more than half nodes in the site (or by tie-breaker plugin), in current implementation
  7. calculate local quorum don't need the information of nodes in other site(s)
  8. global quorum would be decide by 3rd quorum server with the information of the sites and subclusters.
  9. if the subcluster has both the local and global quorums, we say that it has the quorum


Proposed Initial Split-Site Implementation

This would allow us to handle split sites with an equal number of nodes in each site.

If somehow we get these things done, and still have time, then these enhancements are worth considering, in approximately this priority order.

Optional Steps to possibly follow the Initial Implementation

Optional steps (maybe we can get gshi[18] to do some of this work?)


References

[1]http://www.linux-ha.org/Heartbeat
[2]http://www.linux-ha.org/SplitSite
[3]http://www.linux-ha.org/fencing
[4]http://www.linux-ha.org/STONITH
[5]http://www.linux-ha.org/SplitBrain
[6]http://www.linux-ha.org/BadThingsWillHappen
[7]http://www.linux-ha.org/DRBD
[8]http://www.linux-ha.org/HADR
[9]http://www.linux-ha.org/PPRC
[10]http://www.linux-ha.org/HostcacheFile
[11]http://www.linux-ha.org/ClusterConcepts#quorum
[12]http://www.linux-ha.org/tiebreaker
[13]http://www.linux-ha.org/TiebreakerServer
[14]http://www.linux-ha.org/AlanR
[15]http://www.linux-ha.org/HeartbeatSiteDesignations
[16]http://www.linux-ha.org/zhenh
[17]http://www.linux-ha.org/CCM
[18]http://www.linux-ha.org/gshi
[19]http://www.linux-ha.org/CRM
[20]http://www.linux-ha.org/PingNode


This information provided courtesy of the Linux-HA project at http://linux-ha.org/