[lvs-users] ipvsadm problem
dimak at stalker.com
Fri Aug 24 12:58:22 BST 2012
A small addendum to the initial posting: we looked more into the
problem, and it looks like it is caused by the sync'ing daemon: if it is
switched off, then this problem does not take place. But we need to have
sync-daemon running so after a failover we can continue to support
existing session. It's important as the real setups have more than just
2 balanced servers.
On 2012-08-24 15:33, Dmitry Akindinov wrote:
> We are facing a problem with ipvsadm.
> A test system consists of 2 Linux boxes (stock CentOS 6.0), both running
> stock ipvs.
> The application software provides various TCP services (POP, IMAP, HTTP,
> etc.), and also controls
> the ipvs module via the ipvsadm utility.
> Both systems have ipvsadm running. One system is an "active" load
> balancer, one is the 'standby' balancer.
> Both systems are used to serve the TCP request.
> The iptables are used to put a "100" mark on all packets coming to the
> VIP address.
> The "active" loadbalancer has the following config:
> -A -f 100 -s rr -p 1
> -a -f 100 -r server1:0 -g -w 1
> -a -f 100 -r server2:0 -g -w 1
> The "passive" load balancer config is empty (but its iptable still work
> and do mark the VIP packets with the 100 mark).
> The "active" balancer runs the sync daemon in the "master" mode, the
> "passive" balancer - in the "backup" mode.
> Everything works fine, all TCP services are balanced, etc.
> Now, we initiate a failover. During the failover, the ipvs table on the
> old "active" balancer is cleared,
> and the new "active" ipvs gets the same configuration as existed on the
> old one (the same lines as above).
> The usual arp tricks take place to direct the VIP traffic to the new
> The old balancer daemon is stopped and restarted in the "backup" mode,
> the new balancer daemon is stopped
> and restarted in the "master" mode.
> Now, the strange thing start to happen:
> the TCP requests balanced to the new balancer are processed OK.
> the TCP requests balanced via the new balancer to the old balancer work
> half-way one:
> a) the old balancer sees an incoming SYN packet (tcpdump ensures that
> the incoming packets hit the new load balancer first),
> opens the connection, and send the initial prompt (for POP3, IMAP4, SMTP
> protocols) to the client.
> b) the client receives all SYN-ACKs and the prompt data packets, - the
> client is connected and it sees the prompt.
> c) when the client sends any data to the server, the data is delivered
> to the new load balancer, it redirects it to the old balancer, and there
> the packet is just dropped on the floor: the application does not see
> it, the client re-sends the packet after TCP time out, it is delivered
> to the old balancer via the new one, and it is dropped again.
> 1. This problem does not appear after every failover, but it happens in
> many (if not most) cases
> 2. The problem does not go away even if we wait for a few hours after
> the failover took place.
> 3. The problem shows up only for protocols like POP, IMAP, SMTP, where
> the server immediately sends a prompt back to the client.
> The problem does not show up when the HTTP protocol is used, i.e. when
> the client is the first to send data over a newly established connection.
> Finally. If we stop ipvs on the "old" (inactive) load balancer, where it
> is not being used, the problem immediately goes away.
> And if we now restart it (its config rule set being empty before and
> after restart) - the problem does not reappear.
> It looks like the "old" balancer remembers something about the VIP, and
> when we remove its routing rules, it does not clean
> that table, and it causes problems. Which is strange, because we are
> talking about *new* connections, i.e. the connections established after
> the failover is complete: ipvs should not have any info about them that
> it may keep after it stopped being the "active" balancer.
> If course, we can just restart ipvs when it goes from the 'active' to
> the 'passive" state, but that would be kinda rude...
More information about the lvs-users