Ram Pai, who wrote ccm, has some nice presentation slides for ccm. The presentation also includes some information about EVMS. You can download it here.
Here is a diagram I (Guochun Shi) extracted from code
There are 9 states in the ccm state machine, as in the diagram. The numbers denoting transition from one state to another.
CCM_STATE_NONE, CCM_STATE_VERSION_REQUEST, CCM_STATE_JOINING, CCM_STATE_RCVD_UPDATE, CCM_STATE_SENT_MEMLISTREQ, CCM_STATE_REQ_MEMLIST, CCM_STATE_MEMLIST_RES, CCM_STATE_JOINED, CCM_STATE_WAIT_FOR_MEM_LIST, CCM_STATE_WAIT_FOR_CHANGE, CCM_STATE_NEW_NODE_WAIT_FOR_MEM_LIST, CCM_STATE_END
A node has not done anything yet if it is in this state. Every node is initalized to this state. If something goes bad while doing transition in other states, a node may reset itself to this state so it can start a new round.
1 --- After sending out a CCM_TYPE_PRTOVERSION message, this state change to CCM_STATE_VERSION_REQUEST
A node will be in this state after it starts out a message(t = CCM_TYPE_PROTOVERSION) asking for a cluster context. After that it can either go to CCM_STATE_NEW_NODE_WAIT_FOR_MEM_LIST state upon receiving a response (t = CCM_TYPE_PROTOVERSION_RESP) or timeout. In timeout, if it figures out it is the only active node in the cluster, it will enter CCM_STATE_JOINED, otherwise this round turns out to be a failure and it goes back to CCM_STATE_NONE.
2 --- If it timeouts and we still get to try more times, we reset to CCM_STATE_NONE.
3 --- Received a CCM_TYPE_PROTOVERSION_RESP message, send out CCM_TYPE_ALIVE message and change ourself to CCM_STATE_NEW_NODE_WAIT_FOR_MEM_LIST state
5 --- We tried max times and still no response. If we are the highest joined, we change state to CCM_STATE_JOINED
A node in this state can transit to CCM_STATE_JOINED if receiving a CCM_TYPE_MEM_LIST messsage. This can happen if there is only one node in the previously existing cluster. If it receives a CCM_STATE_JOIN message or timesout, it will enter CCM_STATE_JOINING state.
6 --- Received a CCM_TYPE_MEM_LIST message (there is only one node in the previous cluster). It changes state to CCM_STATE_JOINED
4 -- Timeout or received a CCM_TYPE_JOIN message from any node or received a CCM_TYPE_LEAVE message from DC. It starts to send out CCM_TYPE_JOIN message and change its state to CCM_STATE_JOINING.
A node in state will broadcast join messages until it receives responses from all nodes or timeout. Depends on whether it is the cluster leader, it will either send a request membership message (t = CCM_TYPE_REQ_MEMLIST) and enters CCM_STATE_SENT_MEMLISTREQ state or reply to the membership request message and enters into CCM_STATE_MEMLIST_RES state.
7 --- If ccm received CCM_TYPE_JOINING message from all nodes and I am not the leader or timeouts, change into CCM_STATE_RES_MEMLIST
8 -- If ccm received CCM_TYPE_JOINING message from all nodes and I am the leader or timeouts, change into CCM_STATE_RES_MEMLIST
18 -- if ccm has exceeded bigger time, we reset ourselve and change into state CCM_STATE_NONE
This is the potential leader state. A node in this state may get all responses, computer membership and enters into CCM_STATE_JOINED state or it may go to CCM_STATE_NONE if something goes bad.
17 -- On receiving a CCM_TYPE_TIMEOUT/CCM_TYPE_REQ_MEMLIST/CCM_TYPE_RES_MEMLIST message and find itself already exceed info->itf timeout, therefore change into state CCM_STATE_NONE.
10 -- On receiving a CCM_TYPE_TIMEOUT and it has not yet exceed info->itf timeout yet, or a CCM_TYPE_RES_MEMLIST/CCM_TYPE_LEAVE message and find itself have received all CCM_TYPE_RES_MEMLIST and has not exceeded info->itf timeout yet, send out CCM_TYPE_FINAL_MEMLIST and change into CCM_STATE_JOINED as leader.
A node in this state is expecting a message t = CCM_TYPE_FINAL_MEMLIST from the cluster leader. If it gets it, it enters into CCM_STATE_JOINED. Otherwise it will enter CCM_STATE_JOINING again to start a new round. That could happen if the cluster leader dies in this step.
9 -- on receiving a CCM_TYPE_FINAL_MEMLIST message, ccm change into state CCM_STATE_JOINED.
19 -- on receiving a CCM_TYPE_REQ_MEMLIST again but minor transaction number does not match, we reset ourselve to state CCM_STATE_NONE
20 -- on receiving a CCM_TYPE_JOINING with greater trans_minor value, or timeout, or some node that we think as cluster leader left.
This is a coverged state for each node if it goes well. If some node started a new round by sending join messages (t = CCM_TYPE_JOIN) or the cluster leader dies, then this node will also start to broadcast join messages and enters into CCM_STATE_JOINING state. Otherwise, it will record changes and go to CCM_STATE_WAIT_FOR_CHANGE.
Upon a new node joining an already converged cluster, the ideal case will be: the new node sends out "I am alive" message (t = CCM_TYPE_ALIVE") and enters into CCM_STATE_NEW_NODE_WAIT_FOR_MEM_LIST state. The non-cluster-leader nodes, upon receiving this message, sends the cluster leader about this information. The cluster leader collects all messages and replies to the new node with a new membership. If anything fails in this process, one of them -- the cluster leader or the new node -- by timeout-- will initialiate a join protocol and everyone will be be in CCM_STATE_JOINING state.
16 -- On receiving a CCM_TYPE_JOIN message with minor tranction number greater than the local one. Or on receiving a CCM_TYPE_LEAVE with the leaving node being the leader. Send out a CCM_TYPE_JOIN messeage and change into CCM_STATE_JOINING.
12 -- (a) We are the leader, on receiving a CCM_TYPE_LEAVE message with the leaving node not being the leader and we have not received all change message yet
action: ccm changes its state to CCM_STATE_WAIT_FOR_CHANGE.
11 -- (a) We are not the leader, on receiving a CCM_TYPE_LEAVE message with the leaving node not being the leader and we have not received all change message yet
action: ccm changes its state to CCM_STATE_WAIT_FOR_MEM_LIST.
This is a help state to process requests in CCM_STATE_JOINED.
13 -- on receiving a CCM_TYPE_LEAVE/CCM_TYPE_NODE_LEAVE/CCM_TYPE_ALIVE/CCM_TYPE_NEW_NODE and ccm has received all necessary messages, change into state CCM_STATE_JOINED.
15 -- on receiving a CCM_TYPE_LEAVE/CCM_TYPE_NODE_LEAVE/CCM_TYPE_ALIVE/CCM_TYPE_NEW_NODE but they are not expected messages, or timeouts, or on receiving a CCM_TYPE_JOIN message, change into state CCM_STATE_JOINING
CCM_STATE_WAIT_FOR_MEM_LIST This comes from non-leader cluster nodes after it sends out a new node message in CCM_STATE_JOINED state. It is expecting a CCM_TYPE_MEM_LIST message. The node will enter CCM_STATE_JOINING if it does not get that membership list.
14 -- on Receiving a CCM_TYPE_MEM_LIST message, we change state to CCM_STATE_JOINED
15 -- (a) Timeout. We did not get CCM_TYPE_MEM_LIST message we are expecting.
action: We change into CCM_STATE_JOINING and send out a CCM_TYPE_JOIN message.
CCM_TYPE_PROTOVERSION, CCM_TYPE_PROTOVERSION_RESP, CCM_TYPE_JOIN, CCM_TYPE_REQ_MEMLIST, CCM_TYPE_RES_MEMLIST, CCM_TYPE_FINAL_MEMLIST, CCM_TYPE_MEM_LIST, CCM_TYPE_ABORT, CCM_TYPE_TIMEOUT, CCM_TYPE_LEAVE, CCM_TYPE_NODE_LEAVE, CCM_TYPE_ALIVE, CCM_TYPE_NEW_NODE, CCM_TYPE_LAST
This message is only sent out by a node in state CCM_STATE_NONE. CCM will change into state CCM_STATE_VERSION_REQUEST after sendingout this message.
This message is only sent out by a node who is leader or about to become the leader in state CCM_STATE_JOINED to a node who want to join. CCM will stay in CCM_STATE_JOINED or become CCM_STATE_JOINED after sending out this message. However this message is not a necessary step to stay/change into CCM_STATE_JOINED message.
This message is only sent out by by ccm in state CCM_STATE_SENT_MEMLISTREQ if ccm timeouts but it has not exceeded info->itf yet, or on receiving CCM_TYPE_RES_MEMLIST and ccm received all CCM_TYPE_RES_MEMLIST messages and it has not exceeded info->itf, or ccm received CCM_TYPE_RES_MEMLIST and that makes ccm receive all CCM_TYPE_RES_MEMLIST messages.
This message is sent out in lots of cases, e.g. when ccm receives a CCM_TYPE_LEAVE/CCM_TYPE_NEW_NODE messages. CCM will stay or change into CCM_STATE_JOINING after it sends out this message.
This message is sent out by ccm, on receiving a CCM_TYPE_TIMEOUT or CCM_TYPE_JOIN message, either our wait time expires or we have received all response from all nodes, We decide we are the leader and send a CCM_TYPE_REQ_MEMLIST message to the whole cluster.
CCM_TYPE_RES_MEMLIST This message is send out in following cases
a) ccm in state CCM_STATE_RES_MEMLIST/CCM_STATE_REQ_MEMLIST and received a CCM_STATE_REQ_MEMLIST, which means some other node think it is the leader but we don't think so. We send out a CCM_TYPE_RES_MEMLIST message with NULL message.
b) On receiving a CCM_TYPE_TIMEOUT or CCM_TYPE_JOIN message, either our wait time expires or we have received all response from all nodes, We decide we are not the cluster leader and send a CCM_TYPE_RES_MEMLIST message with valid membership to the valid cluster leader and with NULL membership to any invalid cluster.
This message does not seem to be necessary. It shall be removed after everything is clear.
This message is sent out by ccm in case of timeout, which is called by hb_timeout_dispatch().
This message is only sent out when ccm dies in some nodes. There are two ways that we know ccm dies in some node, a) we received a T_APICLISTAT message with F_STATUS != JOINSTATUS. b) We detecting some nodes are dead.
FIXME: is ccm client status callback garanteed? if yes, we don't need to handle case b)
This message only sent out by a node who is not leader and in state CCM_STATE_JOINED on receiving a message CCM_TYPE_LEAVE. CCM will change into state CCM_STATE_WAIT_FOR_MEM_LIST after sending out this message.
This message is only sent out by a node in state CCM_STATE_VERSION_REQUEST, on receiving a CCM_TYPE_PROTOVERSION_RESP message. ccm changes to CCM_STATE_NEW_NODE_WAIT_FOR_MEM_LIST after sending out this message.
This message is only sent out by a non-leader node in CCM_STATE_JOINED on receiving CCM_TYPE_ALIVE message ccm changes to CCM_STATE_WAIT_FOR_MEM_LIST after sending out this message.
Invalid message, never used.