This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag


About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.
To get rid of this notice, you may want to browse the old wiki instead.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2017-12-16 12:21:51

Closed Questions regarding the LocalResourceManager (LRM)

13) AlanRobertson: The LocalResourceManager has been requested to model fencing operations as resource operations.

  • I think this sounds like a big win if we can do this, but as of now, I'm not 100% sure if we know we can do this. More details on this can be found in the NodeFencing page. Also, for some reason, I thought this might also include ResourceFencing (since, after all, our main object is a resource). So, this is unclear to me at the moment.

    HuangZhen: what is the define of ResourceFencing? As my understanding, it means that if the "stop" operation does not work or needs a long time to finish, we can use "fence" operation to stop the resource immediately. If so, it is the RA do the final work of ResourceFencing. LRM will just add a type of operation.

    LarsMarowskyBree: I do not think ResourceFencing needs to be handled specially by the LocalResourceManager. Either a resource is self-fencing, in which case we just start it up, knowing that it will handle all of it internally (and the monitor/status operation would inform us if a reservation was pulled out from under us), or we need to perform a special 'reserve' command - but I would argue that we should encapsulate this into a special ResourceAgent, say SCSIReservation, which the Filesystem resource would then depend on - from the point of view of the LRM, no special handling at all is required for either of these two.

    AlanRobertson: I will mark this issue as resolved pending the review of the LocalResourceManagerInterface. When that comes out, it may become obvious that this is all fine, or it may become obvious that it needs more work.

14) LarsMarowskyBree: Discussion of the states which resources can be in.

  • I suggest that a resource can be in any of the following states (as exposed to an LRM client querying the resource state):
    • stopped
      • start command will lead to starting state

    • starting
      • If no failures occur, will lead to started

      • If a start failure occurs, will lead to failed

    • started
      • Will stick in this state unless monitor ever fails, in which case it will go to failed

    • failed
      • stop command will transition to stopping

      • restart will transition to restarting

    • restarting
      • If no failures occur, will lead to started

      • If a failure occurs, will lead to failed again

    • stopping
      • If no failure occurs, will lead to stopped

      • If a failure occurs, will lead to dead

    • dead

      • This state is special in that no recovery is possible without rebooting the node. It's the dead-end.

    From the ClusterResourceManager perspective, the distinction between starting and started is important, because during a recovery phase we may find a resource in starting phase and know that we can't do anything yet before it has reached the started stage. (Same for the other transitions.) It may also be possible that other resources - ie, primary/secondary-types - can be in more states, transitioning from active to backup or vice-versa, and resources which can be active on a pool of nodes also need to be able to tell that they are number X from a pool of N et cetera. This too needs more thought.

    AlanRobertson had an extended conversation with LarsMarowskyBree on this topic this morning (2/19/2004). We agreed that the LocalResourceManager will not track past resource state. As a result of this conversation, AlanRobertson will mark this item as resolved.

11) Have we agreed yet that the LocalResourceManager can be told to monitor a given resource ever so often in a single command and just sent an event when that fails?

  • I don't find it in the design, and I'd hate to have to send a resource ping across the network potentially every second [;)]

    AlanRobertson answers: Yes. I've discussed this with the rest of the LRM team, and we've talked about it. It was something which I originally neglected to mention to them. I will highlight this issue to them.

9) Should operation timeouts be given on a per-operation basis, or should they be given at the beginning when the ResourceInstance is first instantiated

  • AlanRobertson's CurrentThinking on this subject is that timeouts should be given with each operation to be performed. This would allow different timeouts to be given for the same operation with different parameters. For example, the monitor operation has a parameter which tells the "depth" or difficulty of the monitor operation. Each depth could then easily be supplied with a different timeout. If the timeout is associated with the operation itself, this would be more difficult.

    LarsMarowskyBree: I think giving them per-operation is fine.

8) What about large metadata returns?

  • It is hard to deal with if the metadata is huge even there is an independent callback prototype.

    AlanRobertson: We agreed to put practical limits of (64K for example) on the metadata -- at least for now.

7) AlanRobertson: The module loading and the code of the resource procedures I think should be in the LocalResourceManager process, and not in a child process as the current version of the picture shows.

6) IBM China: What is the resource procedures' meaning?

  • I use it for the code deal with a resource. Like query the status of the resource or start the resource. Because these operations may be performed in the same time, I think they should be done in child process.

    AlanRobertson: They should absolutely be done in a child process. One per operation. No concurrency should be allowed on any given ResourceInstance

5) AlanRobertson: My experience suggests that system(3) is not the way to invoke the ResourceAgent scripts.

  • I would plan on using fork and exec and using the functions from the proctrack module to help you track these processes.There are a number of reasons for this.

    IBM China: you are right, I will change it.

4) IBM China: Do you think the child resource process should live all time or quit when there is no operation?

  • AlanRobertson: They should live only long enough to perform their work. This changes the process structure from what you already showed.

3) IBM China: What is the benefit of using the structure rsc_ops instead of listing the functions in the lrm_rsc? I learned this way from heartbeat.

  • AlanRobertson: I copied that technique from C++ compilers who do this in order to avoid making extra copies of all this data which is always the same.

2) IBM China: In the draft, I merged the basic operations and extension operations into one type of operation. And use string as the id of the operation.

  • AlanRobertson: If I understood your correctly, you merged all the extension operations into a single operation, and used a string as the type of operation for the extension operation. And, you kept the extension operations separate from the basic operations, which are still referred to using an integer or enumeration rather than a string. I think this (if I understand it correctly) is a good thing.

1) IBM China: Can I merge the two types of callback functions, the “operation done” and the “status changed”?

  • AlanRobertson: It seems likely that this is a good idea. However, there's still the nasty return of metadata from the client, which doesn't fit into this category quite so neatly.

    IBM China: In fact, what I mean is that there is only one type of operation in the lrm and use the string as the type of the id of the operation. Thus we can make the api more simple but the strncmp will reduce the preformance of the service. So maybe it is not a good idea

    AlanRobertson: I think the thing to notice is that the operations may have the same types of parameters (i.e., strings), but they give different result types. In particular, the metadata operation returns a block of XML as well as a return code. The others mainly return a return code.

See Also: LocalResourceManagerOpenIssues