New system, higher active connections?

Roberto Nibali ratz at drugphish.ch
Tue Oct 17 15:39:46 BST 2006


>> Was 7.3 with 2.4.x kernel?
> 
> Yes - 2.4.18-18.7.x

Ok, so my assumption holds as a base. Would you be able to check if 
switching to the WLC scheduler changes your observed active connection rate?

>>> Where before, we were seeing Active Connections in the 1-4 range even 
>>> during normal usage, we're now seeing them in the 12-16 range on 
>>> average.  We've got the same weighting on the new server as we did on 
>>> the old.
>>
>> Different Server system and most importantly, different software 
>> configuration. IPVS between 2.4 and 2.6 (provided my assumption above 
>> holds) has change significantly with regards to the ratio of 
>> active/inactive connections. We've seen that in our rrdtool/MRTG 
>> graphs as well.
> 
> It's not so much that we've seen the numbers change, we've seen an 
> actual load impact on our application.  Last night, for example, the 
> load-balanced Apache instances behind LVS were slowing to a crawl and 
> hitting MaxClients even under a fairly light load.  So I think it's more 
> than just us having to readjust our expectations as to the number of 
> active connections.

What's your connection per second rate? It could be (theory) that while 
netfilter in 2.4 maxed out regarding connection tracking and naturally 
didn't allow too many connections through LVS, we now have a much 
powerful box including an improved TCP stack handling with regard to 
netfilter in 2.6.

>> We'd need more information if you want to dig this phenomenon.
> 
> Sure.  This is a very vanilla FC5 system running the 2.6.17-1.2174_FC5 
> kernel.  I'm certainly open to suggestions for tuning that we should do 
> in order to get decent performance - I'd hate to have to drop back to 

Well, the performance bottleneck doesn't seem to be LVS, does it? Do you 
collect statistics somewhere? I'd be interested in the server status of 
the apache instances. Do you get more throughput through the LVS?

> the RH7.3 box.  We pretty much ported the LVS configuration over 
> verbatim from the 7.3 to the FC5, so the only thing that's changed is 
> the OS and hardware.

There's of course always the option to cap the amount of connections 
sent to the RS by using the threshold limitation feature of LVS. But I 
reckon you'd rather find out what exactly is limiting your RS now.

> One thing we did notice is that by removing the persistent= line in our 
> https balancer (we have two stanzas - one for port 80 and one for port 
> 443), the active connection numbers for the port 80 stanza dropped 
> dramatically.  Of course, that broke one part of our application, so we 
> had to reenable it (though at 30 seconds as opposed to 300 seconds), but 
> hopefully that's a clue.

At first thought not so much.

> ######
> # Global Directives
> checktimeout=10
> checkinterval=5

This alone will generate a hell of a lot checks IMHO, which could have 
been too much for the old box to handle. This could mean that now all 
the checks can be done and put an increased load on the RS. The server 
status page should give more input on this, as well as the apache access 
log file.

> autoreload=no
> logfile="/var/log/ldirectord.log"
> quiescent=yes
> 
> virtual=128.109.135.22:80
>         fallback=127.0.0.1:80
>                 real=192.168.0.13:80 masq 2
>                 real=192.168.0.15:80 masq 2
>                 real=192.168.0.16:80 masq 2
>                 real=192.168.0.17:80 masq 2
>                 real=192.168.0.18:80 masq 2
>                 real=192.168.0.19:80 masq 2
>                 real=192.168.0.20:80 masq 2
>                 real=192.168.0.21:80 masq 3
>                 real=192.168.0.22:80 masq 3
>                 real=192.168.0.23:80 masq 3
>                 real=192.168.0.24:80 masq 3
>                 real=192.168.0.25:80 masq 3
>                 real=192.168.0.26:80 masq 3
>                 real=192.168.0.27:80 masq 4
>                 real=192.168.0.28:80 masq 4
>                 real=192.168.0.29:80 masq 4
>                 real=192.168.0.30:80 masq 4
>                 real=192.168.0.31:80 masq 4
>         service=http
>         request="lvs/lvs_donotremove"
>         receive="lvs up"
>         scheduler=wrr
>         protocol=tcp
>         checktype=negotiate
> 
> virtual=128.109.135.22:443
>         fallback=127.0.0.1:443
>                 real=192.168.0.13:443 masq 2
>                 real=192.168.0.15:443 masq 2
>                 real=192.168.0.16:443 masq 2
>                 real=192.168.0.17:443 masq 2
>                 real=192.168.0.18:443 masq 2
>                 real=192.168.0.19:443 masq 2
>                 real=192.168.0.20:443 masq 2
>                 real=192.168.0.21:443 masq 3
>                 real=192.168.0.22:443 masq 3
>                 real=192.168.0.23:443 masq 3
>                 real=192.168.0.24:443 masq 3
>                 real=192.168.0.25:443 masq 3
>                 real=192.168.0.26:443 masq 3
>                 real=192.168.0.27:443 masq 4
>                 real=192.168.0.28:443 masq 4
>                 real=192.168.0.29:443 masq 4
>                 real=192.168.0.30:443 masq 4
>                 real=192.168.0.31:443 masq 4
>         service=https
>         request="lvs/slvs_donotremove"
>         receive="slvs up"
>         scheduler=wrr
>         persistent=30
>         protocol=tcp
>         checktype=negotiate

Could we get some numbers, please?

ipvsadm -L -n -c
ipvsadm -L -n --stats
ipvsadm -L -n --rate
ipvsadm -L -n --timeout
ipvsadm -L -n --persistent-conn

Thanks and best regards,
Roberto Nibali, ratz
-- 
echo 
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc


More information about the lvs-users mailing list