Page updated: 09/12/2004
About the website

The EGEE project officially ended on the 31 March 2006

EGEE II started on 1 April 2006 and the new EGEE website can be found at: http://www.eu-egee.org

What is a Cluster Manager?

The cluster manager (CM) The cluster manager (CM) is a service that runs on each node of a VRS cluster. The CM is primarily responsible for maintaining information about nodes in the system. All nodes are equal; specifically, the CM process on each node is (for now) essentially identical. No one node or CM has special privileges or responsibilities above the others.

Each CM should maintain a (static during runtime) list of authentication information (e.g. public keys) for nodes that may join the cluster. These lists can be hand-edited when a cluster administrator wants to allow a new node or remove (disallow) an existing node. The administrator should ensure that all nodes have the same authentication list, so that when a machine wants to sign on to the cluster, it can connect to and authenticate with any of the nodes currently in the cluster. Adding a new node to the authentication information list requires generating the various authentication keys, adding the private key to the node and the public key to the authentication information list.

The CM should also maintain a (dynamic) list of nodes which are currently active in the cluster. This list must always be consistent among the various nodes currently in the cluster, as various services (in particular, the resource manager) will use transactions that depend on *all* the nodes in the cluster agreeing on an update. If there is any ambiguity in terms of which nodes are currently connected to the cluster and which are not, the transaction manager won't know which nodes to contact and from which to require a 'commit' signal.

When a machine (call it A) wants to sign in to the cluster, it must contact one of the nodes currently in the cluster (call it B), and send its authentication information. Assuming that machine A on the cluster authentication information list and its authentication info is valid, node B will transactionally update the active node list on ALL machines currently on the active node list.