
The LocalResourceManager is an element of the NewHeartbeatDesign[1] which has responsibility for performing operations on resource[2]s, by using ResourceAgent[3] scripts to carry out the work.
The LocalResourceManager is relatively dumb. That is, it does almost nothing on its own, and strictly carries out the wishes of its clients. That is, it has no policies -- it's a PolicyFree[4] server.
The end goal of all the things the LocalResourceManager does is to operate on ResourceInstance[5]s, and provide information about ResourceType[6]s.
It does not initiate operations on its own; however, it does generate events when an operation like monitoring a ResourceInstance[5] fails and notifies the current clients.
Start a ResourceInstance[5]
Stop a ResourceInstance[5]
Begin monitoring a ResourceInstance[5]
Provide the status of a single ResourceInstance[5]
List all the ResourceInstance[5]s it has currently active and their status.
On how to access this, please see LocalResourceManagerInterface[7].
The current architecture drawing for this subsystem is shown below:
AlanR agrees the diagram provides a sound understanding of the job to be done, but disagrees on some details, which are explained below.
Because most ResourceAgent[3] scripts take a reasonable amount of time to perform their work, the LocalResourceManagerInterface[7] needs to be designed to allow operations to be initiated now, and their success reported later in an asynchronous fashion.
The LRM should be prepared to fork and manage many child processes. It may receive several requests for resource management operations all at once. It should only serialize operations for a given ResourceInstance[5].
Because the LocalResourceManager is a separate process, it is necessary for its clients to talk to it through some form of RemoteProcedureCall[8] type of interface. This means that passing pointers to complex objects is painful, and should be avoided when possible.
Notably, the LocalResourceManager does not interact on the network. Remote requests are relayed to it via the ClusterResourceManager[9], and all it sees and deals with itself are local requests coming in via the IPC code.
It is suggested that the LRM uses the gmainloop event handling code for receiving input messages and then dispatches them accordingly via a FSM.
The clients can sign up for receiving notifications if monitoring operations fail and will receive an IPC message with the details in that case.
Each ResourceInstance[5] is uniquely identified to the LocalResourceManager by a unique identifier, or UuId[10]. When clients request an operation to be performed on a ResourceInstance[5], or are sent an event about a status change, a UuId[10] must be used to identify the resource.
The Uuid is assigned via the client when the ResourceInstance[5] is first started / instantiated. In addition to the UuId[10], each ResourceInstance[5] must also be supplied with a HumanName[11] to identify it in system logs. When operations are performed on a ResourceInstance[5], the HumanName[11] must be included in log messages concerning the ResourceInstance[5].
When the LRM starts up for the first time, it does not have any configured resources; neither active, failed nor inactive ones. It does not perform auto-discovery of active ResourceInstances[12], which is impossible -- it does not have the necessary information.
If one were to eventually add the capability for a TransparentUpgrade[13], it would be necessary for the system to cache information on currently running resources in non-volatile storage, exit without stopping them, and on restart restore the information about these resources. Because of tie-ins to the CRM[14], it is unlikely that automatic resumption of monitoring would be an obviously good thing to do. Providing a TransparentUpgrade[13] capability is a task with many questions surrounding it.
Failure: Tell us when/if the monitor operation fails. This will be the normal asynchronous return code from the operation.
LastAction[15]: Tell us the last action requested of a specific resource, and its return code
The DesignatedCoordinator[16] will use the combination of Result of the LastAction[15] and the status operation to compute the current state of all resources in the cluster after being elected. This is required as things may have changed during the election process, especially if the last DesignatedCoordinator[16] suffered a fatal error.
See also: LocalResourceManagerOpenIssues[17], LocalResourceManagerResolvedIssues[18]
| [1] | http://www.linux-ha.org/NewHeartbeatDesign |
| [2] | http://www.linux-ha.org/resource |
| [3] | http://www.linux-ha.org/ResourceAgent |
| [4] | http://www.linux-ha.org/PolicyFree |
| [5] | http://www.linux-ha.org/ResourceInstance |
| [6] | http://www.linux-ha.org/ResourceType |
| [7] | http://www.linux-ha.org/LocalResourceManagerInterface |
| [8] | http://www.linux-ha.org/RemoteProcedureCall |
| [9] | http://www.linux-ha.org/ClusterResourceManager |
| [10] | http://www.linux-ha.org/UuId |
| [11] | http://www.linux-ha.org/HumanName |
| [12] | http://www.linux-ha.org/ResourceInstances |
| [13] | http://www.linux-ha.org/TransparentUpgrade |
| [14] | http://www.linux-ha.org/CRM |
| [15] | http://www.linux-ha.org/LastAction |
| [16] | http://www.linux-ha.org/DesignatedCoordinator |
| [17] | http://www.linux-ha.org/LocalResourceManagerOpenIssues |
| [18] | http://www.linux-ha.org/LocalResourceManagerResolvedIssues |
This information provided courtesy of the Linux-HA project at http://linux-ha.org/