This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag

Homepage

About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.
To get rid of this notice, you may want to browse the old wiki instead.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2017-12-13 22:10:50

Split-Site R2 configurations for Business Continuity

It is common for businesses to configure backup sites for their business continuity plans. Then when one site goes down, the other site can take over the workload from the site which has gone down.

When you configure a Heartbeat cluster in this way with some nodes in one site and some in another site, we call it a split-site or stretch configuration.

Contents

  1. Split-Site R2 configurations for Business Continuity
    1. The Problem
    2. Different Classes of Split-Site Clusters
    3. Quorum and Tie-Breaker Architecture in the R2 CCM
      1. Quorum Plugin
      2. Tiebreaker Plugin
      3. Possible Alternative Quorum Plugins
      4. Possible Alternative Tiebreaker Plugins
    4. Possible Generalizations of quorum and tiebreaker plugin architecture
  2. Proposed Initial Split-Site Implementation
  3. Optional Steps to possibly follow the Initial Implementation

The Problem

When one has a cluster which is in a single location, it is relatively easy to create highly reliable communications between the nodes in a cluster. Combined with fencing techniques like STONITH, it is straightforward to guarantee that it is nearly impossible for a SplitBrain condition to arise.

However, in a split-site configuration, it is virtually impossible to guarantee reliable communications between the cluster nodes, and fencing techniques often are unusable in such a situation - because they also rely on reliable communications.

So in a split-site configuration, one is left with the uncomfortable situation that SplitBrain conditions can routinely arise, and there is no fencing technique available to render them harmless. Unless properly compensated for by other methods, BadThingsWillHappen to a split-site cluster.

One of the key issues to be considered in implementing SplitSite clusters is the replication of state over distance. This is an interesting and difficult problem, but is outside the scope of the Heartbeat discussion.

This will be handled by other software or hardware components. For example, DRBD or HADR could be used to replicate data by software or IBM's PPRC (is that name right?) product to replicate it at the disk hardware level. What Heartbeat will do is manage these replication services.

Different Classes of Split-Site Clusters

There are several possible variants of this problem which each lead to their own unique issues.

  1. 2-node split site clusters
  2. n-node split site clusters with servers split evenly across two sites

  3. n-node split site clusters with servers split unevenly across the two sites

  4. Split-site clusters with an odd number of sites
  5. Split-site clusters with an even number of sites - at least four.

We would like to handle at least cases (a) and (b) above. Handling case (c) well would be a bonus, but isn't completely essential (case 'c' above requires changes to the structure of the hostcache file).

Quorum and Tie-Breaker Architecture in the R2 CCM

In the CCM, the quorum process is broken up into two different pieces, the quorum method and the tie-breaker method. Both are designed as plugins, so that a variety of different methods can be designed, and put together into different solutions for different kinds of configurations.

Quorum Plugin

It is the job of the quorum plugin to decide if the cluster has quorum or not. When invoked, the quorum plugin can return any of the following possible answers:

  • HAVEQUORUM: We Have Quorum
  • NOQUORUM: We Don't Have Quorum
  • TIEQUORUM: We tied, and are unable to determine if we should be granted quorum

As of this writing (2.0.5), we have implemented only one type of quorum plugin - using the classic majority vote scheme. When one subcluster has an absolute majority (> INT(n/2 nodes)), then the plugin returns HAVEQUORUM. When the subcluster has exactly half of the nodes in the cluster, it returns TIEQUORUM. When the subcluster is less than n/2 nodes, it returns "NOQUORUM".

Tiebreaker Plugin

When the Quorum plugin returns TIEQUORUM, then the tiebreaker plugin is called. It is the job of this plugin to use some method to break the tie so that it is virtually impossible for both nodes to think they have broken the tie.

There are is only one tiebreaker plugin currently available - twonode. The twonode tiebreaker breaks the tie if it is called in a two node cluster, and does not break the tie if called in a larger cluster. This is consistent with the behavior of the R1 cluster manager.

Possible Alternative Quorum Plugins

Many different types of quorum plugins are possible. The only constraint is that the combination of quorum plugin and tiebreaker plugin must be highly unlikely to grant quorum to both sides at once. In the absence of fencing mechanisms, it is also necessary to guarantee that there is sufficient time for resources to be stopped before quorum is moved from one subcluster to another.

Here are are a few possible types of quorum plugins that come to mind:

  1. Return HAVEQUORUM when a human being says you have quorum, and NOQUORUM otherwise.
  2. Return HAVEQUORUM when a subcluster has an absolute majority of nodes in an absolute majority of sites

    (needs hostcache file changes).

  3. Return HAVEQUORUM when a subcluster has an absolute majority of a score which is based on a weighting of servers - so that not every server has the same weight.

    (needs hostcache file changes).

  4. Return HAVEQUORUM when a subcluster has an absolute majority of a score based on a weighted set of sites - so that not every site has the same weight. Receiving the vote for a site requires that you have an absolute majority of weighted votes from that site.

    (needs hostcache file changes).

  5. ALWAYS return TIEQUORUM (deferring completely to the tiebreaker plugin)

Note: human intervention may be an attractive alternative to any of these methods. The R2 CRM has this basically built in - because you can tell a subcluster to ignore quorum - which has the same effect as saying "you have quorum unconditionally"

Possible Alternative Tiebreaker Plugins

  1. return HAVEQUORUM when all (or a clear majority) of a set of designated ping nodes is accessible by the subcluster (not well-suited for split-site arrangements - but is a good adjunct to a reliable STONITH method).
  2. Perform a disk reserve operation (not typically suitable for split-site clusters)
  3. Connect to a tiebreaker server which guarantees that it will never break the tie (grant quorum) to one more than one server at a time. This method is somewhat similar to the disk reserve operation, but is software-based, and is well-suited to a split-site configuration.

  4. Human tiebreaker method. Note that this this is not precisely simulated by the "ignore" option on quorum in the CRM. This is because in this case, the human is only consulted if the quorum plugin returns TIEQUORUM.

Note that with this arrangement, it is possible to lose or gain quorum without any change in membership. This is not yet supported by the R2 CCM (but it needs to be).

Possible Generalizations of quorum and tiebreaker plugin architecture

It might be more general to revert to only one kind of plugin - the quorum plugin. And, then one could configure an ordered set of m plugins. The result of a particular plugin would only be taken into account when all previous plugins (if any) returned QUORUMTIE.

In the end, if one of the plugins eventually returned HAVEQUORUM, when all previous plugins had returned QUORUMTIE,then quorum would be granted.

Another (maybe simpler) way of saying this is:

  • The first plugin to return HAVEQUORUM or NOQUORUM wins.

Note that many combinations of plugins make little or no sense.

(don't know if this is clear, or a really good idea ;-))

zhenh: My consider, it may work like this:

  • if one of the plugins returns NOQUORUM, return NOQUORUM.
  • if previous plugin returns HAVEQUORUM or TIEQUORUM, call next one.
  • if the last plugin returns TIEQUORUM, give warning.
  • return the result of last plugin.

    (This is definitely not how I think it should work. This would be (IMHO) broken. Once a node returns HAVEQUORUM, the result should be HAVEQUORUM. This is a prioritization scheme, not a voting scheme -- AlanR)

let's consider that we have a local-quorum plugin(majority), a local tie-breaker(twonodes) and a global quorum (3rd quorum).

  1. if majority returns NOQUORUM, then the subcluster losts quorum.
  2. if majority returns HAVEQUORUM or TIEQUORUM, twonodes tie-breaker will be called.
  3. if HAVEQUORUM is passed to twonodes tie-breaker, twonodes will return HAVEQUORUM.
  4. if TIEQUORUM is passed to twonodes tie-breaker, twonodes will return HAVEQUORUM or NOQUORUM.
  5. if twonodes returns HAVEQUORUM, the global quorum 3rd quorum will be called.
  6. if the global quorum 3rd quorum returns HAVEQUORUM, and no plugin left, the subcluster will get the quorum.

    (Twonode should never be used in a split-site situation. In fact, if we add site designations to nodes in the cluster, it should automatically disable itself).

zhenh's understanding:

  1. the split-site cluster includes one or more sites.
  2. a site means a set of nodes in one subnet
  3. a subcluster means a set of nodes which can connected each other in one site
  4. quorum is required to provide service.
  5. there are two levels of the quorum, local quorum(or inside a site, or in one subnet) and global quorum (inter-site, cross site or whole cluster).
  6. local quorum means more than half nodes in the site (or by tie-breaker plugin), in current implementation
  7. calculate local quorum don't need the information of nodes in other site(s)
  8. global quorum would be decide by 3rd quorum server with the information of the sites and subclusters.
  9. if the subcluster has both the local and global quorums, we say that it has the quorum


Proposed Initial Split-Site Implementation

  • Change the CCM so that plugins can set up callbacks and change their quorum calculation without any change in membership

  • Implement the tiebreaker server quorum tiebreaker plugin

This would allow us to handle split sites with an equal number of nodes in each site.

If somehow we get these things done, and still have time, then these enhancements are worth considering, in approximately this priority order.

Optional Steps to possibly follow the Initial Implementation

Optional steps (maybe we can get gshi to do some of this work?)

  • Implement the site designations and node weighting updates to the hostcache file.

  • Change the voting method to take the node weightings into account (this would allow sites with different numbers of nodes in them to be accommodated)
  • Disable the twonode override when multiple sites are configured.

  • Generalize the quorum/tiebreaker module structure as described above
  • Implement a new CCM API call to allow the list of quorum modules to be set by the CRM. This will make the configuration much more manageable.

  • Implement a human quorum/tiebreaker module which returns QUORUMTIE if no human has approved a quorum override.
  • Add a new voting quorum method which implements the "one vote per site" algorithm.

    This will allow n-site arrangements to be supported

  • Implement a tiebreaker module that counts ping nodes and returns TRUE if all are reachable (I know this has nothing to do with SplitSite, but it would be nice anyway).