keepalive with Cluster infrastructure health check.

Alexandre CASSEN alexandre.cassen at
Mon May 13 15:38:10 BST 2002

Hi Aneesh,

> Attaching below is the patch that add Cluster Infrastructure (
> ) as the health check mechanism for the
> keepalive daemons. Please comment on the patch so that i can continue
> working on it
> --[..snip patch..]--

I have spend a little time reading the CI/SSI docs available on the
projects sites and I have some questions :

If I make some mistake please correct me : According to what I have red on

  * CI provide low level primitives like ICS & CLMS that are used in SSI
code. CI/SSI hookes/patches are installed on all the cluster node. A node
is represented by its NODENUM that is passed to linux kernel during linux
kernel bootstrap. So CI kernel hooks (ICS & CLMS) permit to localize a
specific node according to its NODENUM.

  * SSI use CI primitives (libcluster) to present the collection of cluster
nodes as single system image. This is the real cluster definition, cluster
(I agree) is not only limited to a single virtualization but multiple
system subsystems like FS, network, IPC, process migration, ...

=> This mean that all the cluster nodes must have the same kernel/userspace
enabled functionnalities. This is the point that make me perplexed with the
CI/SSI LVS integration. If I fully understand, using LVS with CI/SSI
introduce the LVS director virtualization based on CI CLUSTER_NODENUM
identification. So we run a LVS director pool. A LVS director submit
schedulers decisions to loadbalance incoming traffic to a specific
realserver. So the CI/SSI provide the CVIP to expose LVS cluster. This is
my first question : Using this CVIP mean that all node consider this IP as
a local address, so what about LVS connection table ? if traffic is
oriented to a specific LVS director the first time and then jump to another
the second time, what about connections that require persistents
connections ? we will probably introduce a loadbalancing breaking point ?

=> LVS code is a routing software, so if we run LVS in a director CI/SSI
virtualized environnement, the LVS connection table must be synced probably
using a hook into the ICS (a ICS channel) to garanty loadbalancing decision
integrity (consistence). Another point is the CVIP takeover, does the CVIP
is a roaming VIP on the cluster ? if not what makes this CVIP takeover on
another cluster node ?

=> Reading the patch, it figures out that CI nodes availability is checked
with "clusternode_info" over a cluster NODENUM. So node availibility is
returned according to your "keepalive" tool that inform and update the
kernel CI cluster node map. This imply that CI/SSI must be installed on all
CI/SSI cluster nodes and LVS too. This introduce the fact that a CI/SSI
cluster is some kind of global virtual cluster including (with the
loadbalancing eyes) LVS directors and realservers... This is my other
perplexed point. To use CI healcheck framework, we need to hook kernel with
CI patch, but we need to patch kernel with LVS code to provide
loadbalancing part. So realservers and LVS director are global part of the
same cluster exposed and a server can be LVS director and realserver
(webserver for example) at a time. This limit the realserver pool to be
linux ? I doesn t fully understand that point ... :/

=> Conserning your patch, you have done all in the right place. My only
inputs on your contributed code are :

+void ci_get_handler(vector strvec)
+     nodemap = (nodenum_ip_map_t *)malloc(sizeof(nodenum_ip_map_t)
*cluster_maxnodes()+1); /* zero is not considered as a valid node */
+     if( initialize_nodemap(nodemap) < 0 );
+     /* How do i inform the main line code ? */
+     queue_checker(NULL,dump_ci_check,ci_check_thread,NULL);

the main line code is informed by queue_checker. During bottstrap
Keepalived parse the configuration file and then queue a checker on each
keyword known. After the keepalived.conf has been successfully parsed,
keepalived de-queue each checker running the specific checker registration
function. => this is the code in check_api.c the function

+int node_to_address(clusternode_t node, real_server *rs)
+     if( node  > cluster_maxnodes() ) {
+           syslog(LOG_ERR,"Node number greater than Max node num \n");
+           return -1;
+     }
+     rs->addr_ip = nodemap[node].addr_ip;
+     /* do I need to fill the rest of the values here ? */
+     return 1;

real_server structure is filled during keepalived.conf parsing. In our case
the configuration file looks like :

virtual_server 80 {
    delay_loop 30
    lb_algo rr
    lb_kind NAT
    protocol TCP

    real_server 80 {
        weight 1


so the node_to_address() function should be suppress since realserver
structure is filled into parser.c.

Is there another way to get the node num from the IP address ? some kind of
procfs entry so we can reduce again the code removing initialize_nodemap
function ? Much more, if we can get this info through procfs, is it
possible to create some kind of kernel reflection function that reflect
nodemap status to an internal keepalived daemon nodestatus representation ?

the code I am addressing is :

+int nodestatus(real_server real)
+     int node_num;
+     clusternode_info_t ni;
+     if((node_num  = address_to_nodenum(real.addr_ip) ) == 0 )
+           return UNKNOWN_NODE;
+     if(clusternode_info(node_num, sizeof(ni), &ni) >= 0) {
+           if (ni.node_state == CLUSTERNODE_UP  )
+                 return UP;
+           else
+                 /* I  am insterested only in two state either fully up or
down */
+                 return DOWN;
+        }
+        else {
+                syslog(LOG_ERR,"Error in getting the cluster information
+        }
+           return UNKNOWN_NODE;

If we can use some kind of kernel reflection this can be done with
something like :

+int nodestatus(real_server rs)
+     int status = 0;
+     status = CI_NODE_STATE(rs);
+     if (status < 0)
+       return UNKNOWN_NODE;
+     return (status == CLUSTERNODE_UP)?UP:DOWN;

CI_NODE_STATE() a simple C macro returning a daemon ci structure status
flags that is updated by kernel broadcast through a kernel socket to the CI
kernel hooks ? this the design keepalived run with netlink.

Best regards, and thanks for your inputs,

More information about the lvs-users mailing list