LVS-DR keepalived problem

Paolo Perrucci p.perrucci at ludonet.it
Thu Jun 15 09:18:04 BST 2006


Hi all,

I trying to configure a LVS-DR with 2 servers (centos 4.3) using
keepalived 1.1.12 for an http service.
The 2 servers acts as master director/slave director and real servers.

The problem arise when the 3rd client request arrive on the director.
 From the client side, the browser wait for the connection to be
established without success and after a while it fails.
 From the real servers point of view, I see a LOT of network traffic
consisting of only SYN packet.
My configuration is:

VIP: 10.0.91.25
RIP1: 10.0.91.23
RIP1: 10.0.91.24
Client: 10.0.90.116

--------------------------- keepalived.conf on real server 1 (10.0.91.23)
vrrp_instance VI_1 {
        state MASTER
        interface eth0
        track_interface {
                eth0
        }
        lvs_sync_daemon_interface eth0
        virtual_router_id 25
        priority 150
        advert_int 2
        authentication {
                auth_type PASS
                auth_pass tps
        }
        virtual_ipaddress {
                10.0.91.25/24
        }
        notify_master "/etc/keepalived/ip_localhost del"
        notify_backup "/etc/keepalived/ip_localhost add"
        notify_fault "/etc/keepalived/ip_localhost add"
}

virtual_server 10.0.91.25 80  {
        delay_loop 5
        lb_algo rr
        lb_kind DR
        protocol TCP
        real_server 10.0.91.23 80 {
                weight 1
                inhibit_on_failure
                TCP_CHECK {
                        connect_port 80
                        connect_timeout 3
                        nb_get_retry 3
                        delay_before_retry 1
                }
        }
        real_server 10.0.91.24 80 {
                weight 1
                inhibit_on_failure
                TCP_CHECK {
                        connect_port 80
                        connect_timeout 3
                        nb_get_retry 3
                        delay_before_retry 1
                }
        }
}
--------------------------------------------------------------------------------------


---------------------------  keepalived.conf on real server 2 (10.0.91.24)
vrrp_instance VI_1 {
        state BACKUP
        interface eth0
        track_interface {
                eth0
        }
        lvs_sync_daemon_interface eth0
        virtual_router_id 25
        priority 100
        advert_int 2
        authentication {
                auth_type PASS
                auth_pass tps
        }
        virtual_ipaddress {
                10.0.91.25/24
        }
        notify_master "/etc/keepalived/ip_localhost del"
        notify_backup "/etc/keepalived/ip_localhost add"
        notify_fault "/etc/keepalived/ip_localhost add"
}

virtual_server 10.0.91.25 80  {
        delay_loop 5
        lb_algo rr
        lb_kind DR
        protocol TCP
        real_server 10.0.91.23 80 {
                weight 1
                inhibit_on_failure
                TCP_CHECK {
                        connect_port 80
                        connect_timeout 3
                        nb_get_retry 3
                        delay_before_retry 1
                }
        }
        real_server 10.0.91.24 80 {
                weight 1
                inhibit_on_failure
                TCP_CHECK {
                        connect_port 80
                        connect_timeout 3
                        nb_get_retry 3
                        delay_before_retry 1
                }
        }
}
--------------------------------------------------------------------------------------


--------------------------------------------------------------------------------------
/etc/keepalived/ip_localhost is the script used to setup the VIP (bound
to lo) on the real servers:

#/bin/sh
case "$1" in
  add)
        ip addr add 10.0.91.25/32 dev lo brd + scope host
        ;;
  del)
        ip add del 10.0.91.25/32 dev lo
        ;;
  *)
        echo "Usage: $0 {add|del}"
        exit 1
esac
exit 0
--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------
/etc/sysctl.conf

net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.accept_source_route = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
--------------------------------------------------------------------------------------

After starting the keepalived service on the two servers I have this
network configuration on the first real server:

1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:1a:ce:fe brd ff:ff:ff:ff:ff:ff
    inet 10.0.91.23/24 brd 10.0.91.255 scope global eth0
    inet 10.0.91.25/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fe1a:cefe/64 scope link
       valid_lft forever preferred_lft forever

and this one on the 2nd real server:

1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 10.0.91.25/32 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:7a:c2:d3 brd ff:ff:ff:ff:ff:ff
    inet 10.0.91.24/24 brd 10.0.91.255 scope global eth0
    inet6 fe80::20c:29ff:fe7a:c2d3/64 scope link
       valid_lft forever preferred_lft forever

The ipvsadm status seems to be correct.
On the 1st server is:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.91.25:http rr
  -> 10.0.91.24:http              Route   1      0          0
  -> 10.0.91.23:http              Local   1      0          0

On the 2nd server is:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.91.25:http rr
  -> 10.0.91.24:http              Local   1      0          0
  -> 10.0.91.23:http              Route   1      0          0

When the 3rd client request arrive on the server this is the tcpdump
output on the first node:

...
00:49:02.366902 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.366929 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367082 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367095 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367878 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367902 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367881 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367910 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367882 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367916 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.368584 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
...

and the same you can see in the tcpdump output from the 2

...
22:51:39.744887 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746808 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746843 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746816 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746862 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746818 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746884 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747879 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747909 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747881 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747949 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.748892 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.748923 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.749745 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
...

As you can see from the timestamps it's a lot of network traffic.
It seems like there is a loop between the two server.
The first two client requests are handled correctly: the first one goes
to the first node and the 2nd one goes to the other node.

Anyone can give me some hints to debug (and hopefully solve) the problem.
Thank you
Paolo


Search lvs-users Archives
Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort

More information about the lvs-users mailing list