[lvs-users] question about non recovering

Michiel van Es mve at pcintelligence.nl
Mon Apr 26 23:04:45 BST 2010


Hello,

I am using the following setup:

- loadbalancer: fedora 12 - lvs with ldirector , vip 194.145.200.87
- server 1: real server : centos 5.4 - 194.145.200.17
- server 2: real server : centos 5.4 - 194.145.200.171

ldirector.cf:

checktimeout=3
checkinterval=1
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=194.145.200.87:25
         fallback=127.0.0.1:25
         real=194.145.200.17:25 gate
         real=194.145.200.171:25 gate
         service=smtp
         persistent=100
         scheduler=rr
         protocol=tcp
         checktype=negotiate



On the load balancer:

net.ipv4.ip_forward = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_announce = 2


On the real server:
# Controls IP packet forwarding
net.ipv4.ip_forward = 1

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_announce = 2

and an lo:0 iface:
DEVICE=lo:0
IPADDR=194.145.200.87
NETMASK=255.255.255.255
ONBOOT=yes
NAME=loopback

When I start ldirectord on the load balancer, everything seems to work fine:

MvE-machine:~ mve$ telnet 194.145.200.87 25
Trying 194.145.200.87...
Connected to ts3-87.twistspace.com.
Escape character is '^]'.
220 PCIntelligence mailserver 2 - mx2.pcintelligence.nl ESMTP


Mon Apr 26 23:40:19 2010|ldirectord|2507] Purged virtual server (stop): 
194.145.200.87:25
[Mon Apr 26 23:40:19 2010|ldirectord|2507] Linux Director Daemon 
terminated on signal: TERM
[Mon Apr 26 23:40:20 2010|ldirectord|2934] Starting Linux Director 
v1.186-ha as daemon
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Added virtual server: 
194.145.200.87:25
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Added fallback server: 
127.0.0.1:25 (194.145.200.87:25) (Weight set to 1)
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Resetting soft failure count: 
194.145.200.17:25 (tcp:194.145.200.87:25)
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Added real server: 
194.145.200.17:25 (194.145.200.87:25) (Weight set to 1)
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Deleted fallback server: 
127.0.0.1:25 (194.145.200.87:25)
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Resetting soft failure count: 
194.145.200.171:25 (tcp:194.145.200.87:25)
[Mon Apr 26 23:40:20 2010|ldirectord|2939] Added real server: 
194.145.200.171:25 (194.145.200.87:25) (Weight set to 1)

The strange thing is I always come on server2 (194.145.200.171:25)

When I stop Qmail on this server:

[Mon Apr 26 23:48:59 2010|ldirectord|2939] Deleted real server: 
194.145.200.171:25 (194.145.200.87:25)
[Mon Apr 26 23:49:11 2010|ldirectord|2939] Resetting soft failure count: 
194.145.200.171:25 (tcp:194.145.200.87:25)

MvE-machine:~ mve$ telnet 194.145.200.87 25
Trying 194.145.200.87...
telnet: connect to address 194.145.200.87: Connection refused
telnet: Unable to connect to remote host

(it does not fallback to server1 - real server 194.145.200.17:25)

when I start the Qmail service on server2 again I keep getting:


MvE-machine:~ mve$ telnet 194.145.200.87 25
Trying 194.145.200.87...
telnet: connect to address 194.145.200.87: Connection refused
telnet: Unable to connect to remote host

and in the log I see ldirectord is seeing it is up again but I can not 
get traffic to it.

[Mon Apr 26 23:49:11 2010|ldirectord|2939] Added real server: 
194.145.200.171:25 (194.145.200.87:25) (Weight set to 1)
^C

My questions are:

a) why isn't using fail over to connect to server 1 and why am I only 
seeing server2 when it is using rr?
b) why isn't the connection established even if I start up Qmail again 
on server2?

Kind regards,

Michiel






More information about the lvs-users mailing list