Linux-HA Logo

Introduction to the LRM

The LocalResourceManager is an element of the NewHeartbeatDesign[1] which has responsibility for performing operations on resource[2]s, by using ResourceAgent[3] scripts to carry out the work.

The LocalResourceManager is relatively dumb. That is, it does almost nothing on its own, and strictly carries out the wishes of its clients. That is, it has no policies -- it's a PolicyFree[4] server.

The end goal of all the things the LocalResourceManager does is to operate on ResourceInstance[5]s, and provide information about ResourceType[6]s.

It does not initiate operations on its own; however, it does generate events when an operation like monitoring a ResourceInstance[5] fails and notifies the current clients.

Operations which clients can ask it to perform

On how to access this, please see LocalResourceManagerInterface[7].

Implementation details

Overview diagram

The current architecture drawing for this subsystem is shown below:

AlanR agrees the diagram provides a sound understanding of the job to be done, but disagrees on some details, which are explained below.

Handling of child processes for ResourceInstances

Because most ResourceAgent[3] scripts take a reasonable amount of time to perform their work, the LocalResourceManagerInterface[7] needs to be designed to allow operations to be initiated now, and their success reported later in an asynchronous fashion.

The LRM should be prepared to fork and manage many child processes. It may receive several requests for resource management operations all at once. It should only serialize operations for a given ResourceInstance[5].

Interaction with the LRM for clients

Because the LocalResourceManager is a separate process, it is necessary for its clients to talk to it through some form of RemoteProcedureCall[8] type of interface. This means that passing pointers to complex objects is painful, and should be avoided when possible.

Notably, the LocalResourceManager does not interact on the network. Remote requests are relayed to it via the ClusterResourceManager[9], and all it sees and deals with itself are local requests coming in via the IPC code.

Event handling inside the LRM

It is suggested that the LRM uses the gmainloop event handling code for receiving input messages and then dispatches them accordingly via a FSM.

The clients can sign up for receiving notifications if monitoring operations fail and will receive an IPC message with the details in that case.

Identifying a ResourceInstance

Each ResourceInstance[5] is uniquely identified to the LocalResourceManager by a unique identifier, or UuId[10]. When clients request an operation to be performed on a ResourceInstance[5], or are sent an event about a status change, a UuId[10] must be used to identify the resource.

The Uuid is assigned via the client when the ResourceInstance[5] is first started / instantiated. In addition to the UuId[10], each ResourceInstance[5] must also be supplied with a HumanName[11] to identify it in system logs. When operations are performed on a ResourceInstance[5], the HumanName[11] must be included in log messages concerning the ResourceInstance[5].

Start/Restart handling of the LRM itself

When the LRM starts up for the first time, it does not have any configured resources; neither active, failed nor inactive ones. It does not perform auto-discovery of active ResourceInstances[12], which is impossible -- it does not have the necessary information.

If one were to eventually add the capability for a TransparentUpgrade[13], it would be necessary for the system to cache information on currently running resources in non-volatile storage, exit without stopping them, and on restart restore the information about these resources. Because of tie-ins to the CRM[14], it is unlikely that automatic resumption of monitoring would be an obviously good thing to do. Providing a TransparentUpgrade[13] capability is a task with many questions surrounding it.

Requirements of the LRM from the CRM

The DesignatedCoordinator[16] will use the combination of Result of the LastAction[15] and the status operation to compute the current state of all resources in the cluster after being elected. This is required as things may have changed during the election process, especially if the last DesignatedCoordinator[16] suffered a fatal error.

See also: LocalResourceManagerOpenIssues[17], LocalResourceManagerResolvedIssues[18]


References

[1]http://www.linux-ha.org/NewHeartbeatDesign
[2]http://www.linux-ha.org/resource
[3]http://www.linux-ha.org/ResourceAgent
[4]http://www.linux-ha.org/PolicyFree
[5]http://www.linux-ha.org/ResourceInstance
[6]http://www.linux-ha.org/ResourceType
[7]http://www.linux-ha.org/LocalResourceManagerInterface
[8]http://www.linux-ha.org/RemoteProcedureCall
[9]http://www.linux-ha.org/ClusterResourceManager
[10]http://www.linux-ha.org/UuId
[11]http://www.linux-ha.org/HumanName
[12]http://www.linux-ha.org/ResourceInstances
[13]http://www.linux-ha.org/TransparentUpgrade
[14]http://www.linux-ha.org/CRM
[15]http://www.linux-ha.org/LastAction
[16]http://www.linux-ha.org/DesignatedCoordinator
[17]http://www.linux-ha.org/LocalResourceManagerOpenIssues
[18]http://www.linux-ha.org/LocalResourceManagerResolvedIssues


This information provided courtesy of the Linux-HA project at http://linux-ha.org/