Note: This is a proposal still and under reasonable heavy discussion. Please read the NodeFencing page in addition to this one; this page only explains the implementation of the STONITH Agents, not the integration into the CRM design.
StonithAgents are essentially a special ResourceAgentClass (in minor extension of the OCF ones), but named differently to avoid any confusion right away. In particular, they are very similar to OpenClusterFramework resource agents, with the following differences:
They are not located under /usr/ocf, but under /usr/lib/heartbeat/stonith.d/.
On start, they start a daemon process to connect to the STONITH device; this is the instantiation of what is called the STONITH Controller in the NodeFencing page. This daemon monitors and controls the STONITH Device, and, if possible/necessary, does what is needed to ensure that we are the only one accessing the device (so that no other task interferes).
Surprisingly, on stop, said daemon is stopped and thus the ownership of the STONITH Device released.
monitor does just what it does for regular resources; it verifies whether the daemon is still running and performs a health check request by which the daemon reports whether it can still reach the STONITH Device.
There is two additional commands which the StonithAgents must support:
The fence operation, which we supply with a comma-separated list of node names to fence (via a OCF_RESKEY_STONITH_NODES environment variable) and report back on stdout for each node name whether the STONITH operation was successful or not. A non-zero exit code on the fence operation shall be interpreted a complete failure to reach the STONITH device, so the CRM can reallocate the STONITH controller resource on another node, if applicable.
The list-fence-targets which - after the resource has been started - reports the list of nodes which the device controls to stdout (which is already relayed back to us via the LocalResourceManager).
SunJiangDong said: According to the new node fencing architecture, now there is no actual StonithAgent scripts, only function simulation via stonith RA plugin plus the stonith daemon. Moreover, the functions corresponding to the above two operations will be moved to and implemented in node fencing daemon's API library, there will be no these two opertations on virtual StonithAgents. In other word, now these virtual StonithAgents act more like the standard OCF RAs. Please refer to SmartFencingDaemonProposal. Any comment?
Extension under discussion:
The monitor operation shall not only report back whether or not the STONITH device they control is reachable or not, but will also signal if the list of nodes it could control has changed, so we can dynamically reload it. As the daemon can easily track this, this would be most helpful so that we are notified when this list changes and thus can reload it dynamically. But AlanRobertson really doesn't seem to like this, but LarsMarowskyBree still hopes to convince him