Linux-HA Logo

Split-Brain

A split-brain condition is the result of a ClusterPartition[1], where each side believes the other is dead, and then proceeds to take over resource[2]s as though the other side no longer owned any resources.

After this, a variety of BadThingsWillHappen[3] - including destroying shared disk data.

This is the result of acting on incomplete information - neglecting DunnsLaw[4]. That is, when a node is declared "dead", its status is, by definition, not known. Perhaps it is dead, perhaps it is merely incommunicado. The only thing that is known is that its status is not known.

The ultimate cure to this is to use Fencing[5] and lock the other side out.

The problem with merely using quorum without fencing[6], is that the loss of quorum can take an unbounded amount time to detect and react to in the worst case.

Fencing does not require knowledge of the timing or behavior of the "errant" nodes, nor does it require the cooperation or sanity of errant nodes. In addition, fencing operations receive positive confirmation. Hence, fencing has a high degree of certainty.

A good way of avoiding split brain conditions in most cases without having to resort to fencing is to configure redundant and independent cluster communications paths - so that loss of a single interface or path does not break communication between the nodes - that is the communications should not have a single point of failure[7].

Using both redundant communications and fencing is a good way to go. We highly recommend both.

See Also

Split-brain, quorum, fencing overview[8], ClusterConcepts[9], fencing[6], quorum[10], STONITH[11], SPOF[7], FAQ on tuning deadtime[12], deadtime directive[13], warntime directive[14]


References

[1]http://www.linux-ha.org/ClusterPartition
[2]http://www.linux-ha.org/resource
[3]http://www.linux-ha.org/BadThingsWillHappen
[4]http://www.linux-ha.org/DunnsLaw
[5]http://www.linux-ha.org/Fencing
[6]http://www.linux-ha.org/fencing
[7]http://www.linux-ha.org/SPOF
[8]http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html
[9]http://www.linux-ha.org/ClusterConcepts
[10]http://www.linux-ha.org/quorum
[11]http://www.linux-ha.org/STONITH
[12]http://www.linux-ha.org/FAQ#heavy_load
[13]http://www.linux-ha.org/ha.cf/DeadtimeDirective
[14]http://www.linux-ha.org/WarntimeDirective


This information provided courtesy of the Linux-HA project at http://linux-ha.org/