The LocalResourceManager is an element of the NewHeartbeatDesign which has responsibility for performing operations on resources, by using ResourceAgent scripts to carry out the work.
The LocalResourceManager is relatively dumb. That is, it does almost nothing on its own, and strictly carries out the wishes of its clients. That is, it has no policies -- it's a PolicyFree server.
The end goal of all the things the LocalResourceManager does is to operate on ResourceInstances, and provide information about ResourceTypes.
It does not initiate operations on its own; however, it does generate events when an operation like monitoring a ResourceInstance fails and notifies the current clients.
Start a ResourceInstance
Stop a ResourceInstance
Begin monitoring a ResourceInstance
Provide the status of a single ResourceInstance
List all the ResourceInstances it has currently active and their status.
On how to access this, please see LocalResourceManagerInterface.
The current architecture drawing for this subsystem is shown below:
AlanR agrees the diagram provides a sound understanding of the job to be done, but disagrees on some details, which are explained below.
Because most ResourceAgent scripts take a reasonable amount of time to perform their work, the LocalResourceManagerInterface needs to be designed to allow operations to be initiated now, and their success reported later in an asynchronous fashion.
The LRM should be prepared to fork and manage many child processes. It may receive several requests for resource management operations all at once. It should only serialize operations for a given ResourceInstance.
Because the LocalResourceManager is a separate process, it is necessary for its clients to talk to it through some form of RemoteProcedureCall type of interface. This means that passing pointers to complex objects is painful, and should be avoided when possible.
Notably, the LocalResourceManager does not interact on the network. Remote requests are relayed to it via the ClusterResourceManager, and all it sees and deals with itself are local requests coming in via the IPC code.
It is suggested that the LRM uses the gmainloop event handling code for receiving input messages and then dispatches them accordingly via a FSM.
The clients can sign up for receiving notifications if monitoring operations fail and will receive an IPC message with the details in that case.
Each ResourceInstance is uniquely identified to the LocalResourceManager by a unique identifier, or UuId. When clients request an operation to be performed on a ResourceInstance, or are sent an event about a status change, a UuId must be used to identify the resource.
The Uuid is assigned via the client when the ResourceInstance is first started / instantiated. In addition to the UuId, each ResourceInstance must also be supplied with a HumanName to identify it in system logs. When operations are performed on a ResourceInstance, the HumanName must be included in log messages concerning the ResourceInstance.
When the LRM starts up for the first time, it does not have any configured resources; neither active, failed nor inactive ones. It does not perform auto-discovery of active ResourceInstances, which is impossible -- it does not have the necessary information.
If one were to eventually add the capability for a TransparentUpgrade, it would be necessary for the system to cache information on currently running resources in non-volatile storage, exit without stopping them, and on restart restore the information about these resources. Because of tie-ins to the CRM, it is unlikely that automatic resumption of monitoring would be an obviously good thing to do. Providing a TransparentUpgrade capability is a task with many questions surrounding it.
Failure: Tell us when/if the monitor operation fails. This will be the normal asynchronous return code from the operation.
LastAction: Tell us the last action requested of a specific resource, and its return code
The DesignatedCoordinator will use the combination of Result of the LastAction and the status operation to compute the current state of all resources in the cluster after being elected. This is required as things may have changed during the election process, especially if the last DesignatedCoordinator suffered a fatal error.
See also: LocalResourceManagerOpenIssues, LocalResourceManagerResolvedIssues