DR Load balancing active/inactive connections
RU Admin
lvs-user at camden.rutgers.edu
Thu Feb 22 17:49:27 GMT 2007
Horms:
Just wanted to say thank you for your suggestions back in November. I
(finally) upgraded the kernel on the two directors to a custom 2.6.20
kernel about 2-3 weeks ago and that seems to have done the trick with the
connection count problems. I am no longer seeing larger numbers in my
active or inactive connections, they are now timing out properly which is
great.
Thanks!!!
Craig
On Wed, 29 Nov 2006, Horms wrote:
> On Tue, Nov 28, 2006 at 08:37:28AM -0500, RU Admin wrote:
>
> [snip]
>
>>>> When running "ipvsadm -lcn", I can
>>>> see connections with the CLOSE state going from 00:59 to 00:01, and
>>>> then magically going back to 00:59 again for no reason. The same
>>>> holds true for ESTABLISHED connections, I see them go from 29:59 to
>>>> 00:01 and then back to 29:59, and I know for a fact that the
>>>> connection from the client has ended.
>>>
>>> I seem to recall a bug relating to connection entries having
>>> the behaviour you describe above due to a race in reference counting.
>>> Which version of the kernel do you have? Is there any chance of updating
>>> it to something like 2.6.18?
>>
>> I'm using a stock Debian Sarge kernel (2.6.8-2-686-smp), I can
>> definitely build the latest kernel, and if you feel that it will help
>> then I'll do that. It's always risky making a major kernel change on
>> a production machine, which is why I wanted to hold off from making
>> that change until someone else familiar with IPVS, felt that it might
>> help.
>
> I think that it would be worth trying. Can you reproduce the problem
> on a non-production machine?
>
> [snip]
>
>>> I am wondering if the problem is that for some reason the
>>> linux-directors are not seeing the part of the close sequence
>>> that is sent by the end-user (it won't see the portion sent by
>>> the real-servers). Supposing for a minute that this is the case,
>>> it would explain the strange numbers, and those strange numbers
>>> will be effecting how wlc allocates connections.
>>
>> But shouldn't IPVS timeout? I thought that was the purpose of the timeouts...
>> So that when the director doesn't see a close event after a specified period of
>> time, it simply times out.
>
> I actually think my close theory is wrong and that as you point out the
> problem is timeouts. I think that you are correct in thinking that they
> should time out. So that seems to leave us with two main possiblilities
> 1) there is a bug (which may have already been fixed) or 2) we are
> reading the data wrong.
>
> [snip]
>
>>> How exactly did you deal with ARP, there are several methods.
>>
>> On the real servers, I'm first bringing up the dummy0 interface with the VIP,
>> then I use "sysctl" and set the following:
>> net.ipv4.conf.dummy0.rp_filter=0
>> net.ipv4.conf.dummy0.arp_ignore=1
>> net.ipv4.conf.dummy0.arp_announce=2
>> Then I bring up eth0 with the real server's regular IP address, and with
>> "sysctl", I set the following (includes a repeat of the above options):
>> net.ipv4.conf.default.rp_filter=0
>> net.ipv4.conf.all.rp_filter=0
>> net.ipv4.conf.lo.rp_filter=0
>> net.ipv4.conf.dummy0.rp_filter=0
>> net.ipv4.conf.eth0.rp_filter=0
>>
>> net.ipv4.conf.default.arp_ignore=1
>> net.ipv4.conf.all.arp_ignore=1
>> net.ipv4.conf.lo.arp_ignore=1
>> net.ipv4.conf.dummy0.arp_ignore=1
>> net.ipv4.conf.eth0.arp_ignore=1
>>
>> net.ipv4.conf.default.arp_announce=2
>> net.ipv4.conf.all.arp_announce=2
>> net.ipv4.conf.lo.arp_announce=2
>> net.ipv4.conf.dummy0.arp_announce=2
>> net.ipv4.conf.eth0.arp_announce=2
>>
>> The ARP problem was the one thing that kept me from moving to LVS-DR
>> for a long time. I finally started playing with all of the
>> net.ipv4.conf options and bringing up the interfaces in a specific
>> order, and finally stumbled across a method that actually worked. I'm
>> sure some of the above options don't need to be set, but it finally
>> works, and I'm a little afraid to touch it.
>
> What you have above is the prefered method these days.
>
> You shouldn't need to bother with lo and dummy0 as these are non-arping
> interfaces (right?). Though setting them is harmless.
>
> In any case, I agree with your analysis that ARP does not seem to be
> a problem in your setup, as the connections are being forwarded by
> the linux-director.
>
>> I'm going to try and build the latest 2.6.18 now, and hopefully
>> sometime later this week I can install the new kernel and reboot our
>> director. Unfortunately I've never been able to get keepalived to
>> handle a MASTER/SLAVE director properly, so I only have one director
>> in front of the real servers, so if I make a mistake, our main
>> university email server will be down.
>
> ew. Good luck :)
>
> --
> Horms
> H: http://www.vergenet.net/~horms/
> W: http://www.valinux.co.jp/en/
>
>
Search lvs-users Archives
More information about the lvs-users
mailing list