This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag

Homepage

About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.
To get rid of this notice, you may want to browse the old wiki instead.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2017-12-16 06:32:11

Introduction

It appears it's a good time to take a step back and outline the requirements which we need the fencing system to be addressed and make all the implicit assumptions more explicit.

The process which was suggested by AlanRobertson was to first list them all (regardless of their importance), make sure we understand them and they are all sane, and then assign priorities to them, and then build an implementation from there.

In this first step, please list all requirements you have for the fencing functionality. Everybody is invited to participate, and add questions / clarificiations where necessary.

So far, the list is completely unordered, so please don't take offense by that.

Requirement list

Independent at runtime of the CRM-TNG

Requestor: AlanRobertson.

Rationale: The fencing subsystem must work in clusters where the CRM is not present.

Single configuration source

Requestor: AlanRobertson, LarsMarowskyBree.

Rationale: There should be one single configuration frontend to the user to ease the complexity. Thus the fencing subsystem should draw its configuration from the same source as the CRM (if present). The configuration must support localized help texts etc just like the other parts of the system.

This will allow reuse of the moderately complex GUI and CIB infrastructure.

Independent configuration source

Requestor: AlanRobertson

Rationale: If the CRM is not around, the fencing subsystem needs to be able to draw it's configuration from some other means (simple text files).

Bottom-up integration with the CRM

Requestor: AlanRobertson

Rationale: If the lower-level layers (membership & fencing) have already performed fencing of a failed node, the CRM needs to be able to be told and use this information correctly.

Or, to put it differently, the CRM should be capable of registering fencing needs with an external fencing authority.

Top-down integration with the CRM

Requestor: LarsMarowskyBree

Rationale: As the CRM has been assumed to be, if present, the sole enforcer of cluster (recovery) policy, lower-levels have to register with the CRM if they want a node to be fenced.

LarsMarowskyBree: Yes, this is a conflicting requirement to the previous one, and maybe the two should instead be both abolished and replaced with the requirements hidden behind them and then decide on whether we need to go top-down or bottom-up?

Fencing in response to non-node failures

Requestor: LarsMarowskyBree

Rationale: As outlined on the NodeFencing page, STONITH style fencing may occur in response to non-node failures; for example a failed resource stop to recover a high priority resource. In this case, all other resources (be it CRM controlled ones or GFS mounts) need to be informed that we are about to perform such a recovery operation (and migrate other resources away cleanly first).

Coordinating normal shutdown

Requestor: LarsMarowskyBree

Rationale: This goes beyond fencing, but it sure seems to be related, but in case of a regular shutdown of the node, all resources and subsystems need to be disabled in an orderly and dependency-coherent fashion, so that one by one, each of them can release the node so that it doesn't have to be fenced.

In RHAT's GFS, this seems to be done by their Service Manager, and in the heartbeat-TNG world, by the CRM.

Non-node level fencing

Requestor: LarsMarowskyBree

Rationale: In scenarios where other functionality is available, it would be desireable to be able to use resource level fencing - ie, for GFS, while mounting from an iSCSI server (which can block access to the fenced node w/o power cycling the node).

Monitoring of STONITH devices

Rationale: The fencing subsystem must be capable of monitoring the liveness and reachability of the fencing device and taking appropriate action and notification.

Coordinating the access to the STONITH devices

Rationale: Some network power switches or serial power switches by their very nature are only reachable from one node at a time. Thus all access to them, be it for configuration inquiries, monitoring or fencing operations needs to be coordinated to go via a single node from the list of those which can reach the device, or be otherwise serialized to avoid contention and spurious monitoring failures.

This could probably also be phrased as managing the STONITH topology.

Re-use functionality provided by the cluster manager

Requestor: LarsMarowskyBree

Rationale: As there is bound to be some cluster policy manager around, it seems sensible to re-use as much functionality there as possible to reduce the complexity of the fencing subsystem and speed up the implementation work.

Components which seemed to lend themselves to re-use where the managing of resource topology, monitoring functionality and configuration.