[lvs-users] ipvsadm problem

Dmitry Akindinov dimak at stalker.com
Fri Aug 24 12:33:45 BST 2012


We are facing a problem with ipvsadm.

A test system consists of 2 Linux boxes (stock CentOS 6.0), both running 
stock ipvs.

The application software provides various TCP services (POP, IMAP, HTTP, 
etc.), and also controls
the ipvs module via the ipvsadm utility.

Both systems have ipvsadm running. One system is an "active" load 
balancer, one is the 'standby' balancer.
Both systems are used to serve the TCP request.

The iptables are used to put a "100" mark on all packets coming to the 
VIP address.
The "active" loadbalancer has the following config:
-A -f 100 -s rr -p 1
-a -f 100 -r server1:0 -g -w 1
-a -f 100 -r server2:0 -g -w 1

The "passive" load balancer config is empty (but its iptable still work 
and do mark the VIP packets with the 100 mark).
The "active" balancer runs the sync daemon in the "master" mode, the 
"passive" balancer - in the "backup" mode.

Everything works fine, all TCP services are balanced, etc.

Now, we initiate a failover. During the failover, the ipvs table on the 
old "active" balancer is cleared,
and the new "active" ipvs gets the same configuration as existed on the 
old one (the same lines as above).
The usual arp tricks take place to direct the VIP traffic to the new 
The old balancer daemon is stopped and restarted in the "backup" mode, 
the new balancer daemon is stopped
and restarted in the "master" mode.

Now, the strange thing start to happen:
the TCP requests balanced to the new balancer are processed OK.
the TCP requests balanced via the new balancer to the old balancer work 
half-way one:
a) the old balancer sees an incoming SYN packet (tcpdump ensures that 
the incoming packets hit the new load balancer first),
opens the connection, and send the initial prompt (for POP3, IMAP4, SMTP 
protocols) to the client.
b) the client receives all SYN-ACKs and the prompt data packets, - the 
client is connected and it sees the prompt.
c) when the client sends any data to the server, the data is delivered 
to the new load balancer, it redirects it to the old balancer, and there 
the packet is just dropped on the floor: the application does not see 
it, the client re-sends the packet after TCP time out, it is delivered 
to the old balancer via the new one, and it is dropped again.

1. This problem does not appear after every failover, but it happens in 
many (if not most) cases
2. The problem does not go away even if we wait for a few hours after 
the failover took place.
3. The problem shows up only for protocols like POP, IMAP, SMTP, where 
the server immediately sends a prompt back to the client.
The problem does not show up when the HTTP protocol is used, i.e. when 
the client is the first to send data over a newly established connection.

Finally. If we stop ipvs on the "old" (inactive) load balancer, where it 
is not being used, the problem immediately goes away.
And if we now restart it (its config rule set being empty before and 
after restart) - the problem does not reappear.

It looks like the "old" balancer remembers something about the VIP, and 
when we remove its routing rules, it does not clean
that table, and it causes problems. Which is strange, because we are 
talking about *new* connections, i.e. the connections established after 
the failover is complete: ipvs should not have any info about them that 
it may keep after it stopped being the "active" balancer.

If course, we can just restart ipvs when it goes from the 'active' to 
the 'passive" state, but that would be kinda rude...

Best regards,
Dmitry Akindinov

More information about the lvs-users mailing list