This site best when viewed with a modern standards-compliant browser. We recommend Firefox Get Firefox!.

Linux-HA project logo
Providing Open Source High-Availability Software for Linux and other OSes since 1999.

USA Flag UK Flag

Japanese Flag

Homepage

About Us

Contact Us

Legal Info

How To Contribute

Security Issues

This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.
The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.
To get rid of this notice, you may want to browse the old wiki instead.

1 February 2010 Hearbeat 3.0.2 released see the Release Notes

18 January 2009 Pacemaker 1.0.7 released see the Release Notes

16 November 2009 LINBIT new Heartbeat Steward see the Announcement

Last site update:
2017-11-24 22:16:42

Introduction to the LRM

The LocalResourceManager is an element of the NewHeartbeatDesign which has responsibility for performing operations on resources, by using ResourceAgent scripts to carry out the work.

The LocalResourceManager is relatively dumb. That is, it does almost nothing on its own, and strictly carries out the wishes of its clients. That is, it has no policies -- it's a PolicyFree server.

The end goal of all the things the LocalResourceManager does is to operate on ResourceInstances, and provide information about ResourceTypes.

It does not initiate operations on its own; however, it does generate events when an operation like monitoring a ResourceInstance fails and notifies the current clients.

Operations which clients can ask it to perform

On how to access this, please see LocalResourceManagerInterface.

Implementation details

Overview diagram

The current architecture drawing for this subsystem is shown below:

AlanR agrees the diagram provides a sound understanding of the job to be done, but disagrees on some details, which are explained below.

Handling of child processes for ResourceInstances

Because most ResourceAgent scripts take a reasonable amount of time to perform their work, the LocalResourceManagerInterface needs to be designed to allow operations to be initiated now, and their success reported later in an asynchronous fashion.

The LRM should be prepared to fork and manage many child processes. It may receive several requests for resource management operations all at once. It should only serialize operations for a given ResourceInstance.

Interaction with the LRM for clients

Because the LocalResourceManager is a separate process, it is necessary for its clients to talk to it through some form of RemoteProcedureCall type of interface. This means that passing pointers to complex objects is painful, and should be avoided when possible.

Notably, the LocalResourceManager does not interact on the network. Remote requests are relayed to it via the ClusterResourceManager, and all it sees and deals with itself are local requests coming in via the IPC code.

Event handling inside the LRM

It is suggested that the LRM uses the gmainloop event handling code for receiving input messages and then dispatches them accordingly via a FSM.

The clients can sign up for receiving notifications if monitoring operations fail and will receive an IPC message with the details in that case.

Identifying a ResourceInstance

Each ResourceInstance is uniquely identified to the LocalResourceManager by a unique identifier, or UuId. When clients request an operation to be performed on a ResourceInstance, or are sent an event about a status change, a UuId must be used to identify the resource.

The Uuid is assigned via the client when the ResourceInstance is first started / instantiated. In addition to the UuId, each ResourceInstance must also be supplied with a HumanName to identify it in system logs. When operations are performed on a ResourceInstance, the HumanName must be included in log messages concerning the ResourceInstance.

Start/Restart handling of the LRM itself

When the LRM starts up for the first time, it does not have any configured resources; neither active, failed nor inactive ones. It does not perform auto-discovery of active ResourceInstances, which is impossible -- it does not have the necessary information.

If one were to eventually add the capability for a TransparentUpgrade, it would be necessary for the system to cache information on currently running resources in non-volatile storage, exit without stopping them, and on restart restore the information about these resources. Because of tie-ins to the CRM, it is unlikely that automatic resumption of monitoring would be an obviously good thing to do. Providing a TransparentUpgrade capability is a task with many questions surrounding it.

Requirements of the LRM from the CRM

  • Actions: Perform an action on a resource.
    • - 95% of the time, these actions will be
      • Start
      • Stop
      • Status
      • Restart
      • Start monitoring a resource
      • Stop monitoring a resource
  • Failure: Tell us when/if the monitor operation fails. This will be the normal asynchronous return code from the operation.

  • LastAction: Tell us the last action requested of a specific resource, and its return code

The DesignatedCoordinator will use the combination of Result of the LastAction and the status operation to compute the current state of all resources in the cluster after being elected. This is required as things may have changed during the election process, especially if the last DesignatedCoordinator suffered a fatal error.

See also: LocalResourceManagerOpenIssues, LocalResourceManagerResolvedIssues