[lvs-users] ipvsadm problem
dimak at stalker.com
Fri Aug 24 12:33:45 BST 2012
We are facing a problem with ipvsadm.
A test system consists of 2 Linux boxes (stock CentOS 6.0), both running
The application software provides various TCP services (POP, IMAP, HTTP,
etc.), and also controls
the ipvs module via the ipvsadm utility.
Both systems have ipvsadm running. One system is an "active" load
balancer, one is the 'standby' balancer.
Both systems are used to serve the TCP request.
The iptables are used to put a "100" mark on all packets coming to the
The "active" loadbalancer has the following config:
-A -f 100 -s rr -p 1
-a -f 100 -r server1:0 -g -w 1
-a -f 100 -r server2:0 -g -w 1
The "passive" load balancer config is empty (but its iptable still work
and do mark the VIP packets with the 100 mark).
The "active" balancer runs the sync daemon in the "master" mode, the
"passive" balancer - in the "backup" mode.
Everything works fine, all TCP services are balanced, etc.
Now, we initiate a failover. During the failover, the ipvs table on the
old "active" balancer is cleared,
and the new "active" ipvs gets the same configuration as existed on the
old one (the same lines as above).
The usual arp tricks take place to direct the VIP traffic to the new
The old balancer daemon is stopped and restarted in the "backup" mode,
the new balancer daemon is stopped
and restarted in the "master" mode.
Now, the strange thing start to happen:
the TCP requests balanced to the new balancer are processed OK.
the TCP requests balanced via the new balancer to the old balancer work
a) the old balancer sees an incoming SYN packet (tcpdump ensures that
the incoming packets hit the new load balancer first),
opens the connection, and send the initial prompt (for POP3, IMAP4, SMTP
protocols) to the client.
b) the client receives all SYN-ACKs and the prompt data packets, - the
client is connected and it sees the prompt.
c) when the client sends any data to the server, the data is delivered
to the new load balancer, it redirects it to the old balancer, and there
the packet is just dropped on the floor: the application does not see
it, the client re-sends the packet after TCP time out, it is delivered
to the old balancer via the new one, and it is dropped again.
1. This problem does not appear after every failover, but it happens in
many (if not most) cases
2. The problem does not go away even if we wait for a few hours after
the failover took place.
3. The problem shows up only for protocols like POP, IMAP, SMTP, where
the server immediately sends a prompt back to the client.
The problem does not show up when the HTTP protocol is used, i.e. when
the client is the first to send data over a newly established connection.
Finally. If we stop ipvs on the "old" (inactive) load balancer, where it
is not being used, the problem immediately goes away.
And if we now restart it (its config rule set being empty before and
after restart) - the problem does not reappear.
It looks like the "old" balancer remembers something about the VIP, and
when we remove its routing rules, it does not clean
that table, and it causes problems. Which is strange, because we are
talking about *new* connections, i.e. the connections established after
the failover is complete: ipvs should not have any info about them that
it may keep after it stopped being the "active" balancer.
If course, we can just restart ipvs when it goes from the 'active' to
the 'passive" state, but that would be kinda rude...
More information about the lvs-users