FW: new question - iptables on LB and connection limit?

Roberto Nibali ratz at drugphish.ch
Wed Nov 22 14:34:47 GMT 2006


Larry Ludwig wrote:
>> Sorry, but I simply don't understand this. iptables is a user space
>> command which cannot be started or stopped. It's a command line tool and
>> has little to do with your problem. Is the connection tracking still
>> running in the kernel? What does your lsmod show?
> 
> Sure it gets unload via the 'service' command

Ahh, now I get it.

> [root at loadb1 ha.d]# service iptables stop
> Flushing firewall rules:                                   [  OK  ]
> Setting chains to policy ACCEPT: filter                    [  OK  ]
> Unloading iptables modules:                                [  OK  ]
> 
> lsmod doesn't show it running.

So if you stop your iptables service, there is no /proc/net/ip_conntrack
anymore, right?

>> What kind of page do you fetch with this? Static or dynamic? 
> 
> Simple static page.

Ok.

>> What's its size? 
> 
> Under 5k for the testing.  Page is much bigger for the real content now,
> still static.  See below

So this does not fit into one TCP packet for one PSH. This will have an 
impact on what you measure.

>> BTW, with 2.6 kernel test clients spawning 1000 threads sometimes
>> lead to stalls due to the local_port_range and gc cleanups. What's your
>> local port range settings on your client? Also please show the ulimit -a
>> command output right before your start your test conducts.
> 
> [root at zeus ~]# ab  -n 300000 -c 1000 http://67.72.106.71/
> This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
> Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Copyright (c) 1998-2002 The Apache Software Foundation,
> http://www.apache.org/
> 
> Benchmarking 67.72.106.71 (be patient)
> 
> [root at zeus ~]# ab  -n 100000 -c 1000 http://67.72.106.71/ 
> This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
> Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Copyright (c) 1998-2002 The Apache Software Foundation,
> http://www.apache.org/
> 
> Benchmarking 67.72.106.71 (be patient)
> 
> [root at zeus ~]# ab  -n 10000 -c 1000 http://67.72.106.71/ 
> This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
> Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Copyright (c) 1998-2002 The Apache Software Foundation,
> http://www.apache.org/
> 
> Benchmarking 67.72.106.71 (be patient)
> Completed 1000 requests
> Completed 2000 requests
> Completed 3000 requests
> Completed 4000 requests
> Completed 5000 requests
> Completed 6000 requests
> Completed 7000 requests
> Completed 8000 requests
> Completed 9000 requests
> Finished 10000 requests
> 
> Server Software:        lighttpd
> Server Hostname:        67.72.106.71
> Server Port:            80
> 
> Document Path:          /
> Document Length:        7327 bytes
> 
> Concurrency Level:      1000
> Time taken for tests:   10.679202 seconds
> Complete requests:      10000
> Failed requests:        5694
>    (Connect: 0, Length: 5694, Exceptions: 0)

This is a massive amount of failed requests!

> Write errors:           0
> Total transferred:      122363820 bytes
> HTML transferred:       119753282 bytes
> Requests per second:    936.40 [#/sec] (mean)
> Time per request:       1067.920 [ms] (mean)
> Time per request:       1.068 [ms] (mean, across all concurrent requests)
> Transfer rate:          11189.51 [Kbytes/sec] received

Wire speed, you capped the test with your throughput. Try to recreate 
the test using a Gbit network. The retransmits due to TCP timeouts won't 
probably even get through this net-pipe anymore.

> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        8  168 666.8     20    9020
> Processing:    21  544 1336.0    102   10609
> Waiting:        8  279 1072.8     21    9032
> Total:         32  713 1476.9    127   10647

The difference between min and max is so big that one of the servers is 
probably 0% idle and therefor the drops happen. Netfilter only makes 
matters worse.

> Percentage of the requests served within a certain time (ms)
>   50%    127
>   66%    341
>   75%    379
>   80%    760
>   90%   3069
>   95%   3308
>   98%   4716
>   99%   9149
>  100%  10647 (longest request)

This percentile distribution does not explain the timed-out requests 
directly unless a RTT of over 127ms is already deadly for one side of 
the test setup.

> [root at zeus ~]# ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1024
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024

This might be another source of your problems, since this is open ab 
files (per thread) including already opened fds. And thus this is a low 
number.

> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 16383
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> [root at zeus ~]# sysctl -a | grep local_
> net.ipv4.ip_local_port_range = 32768    61000

This is enough.

>> ??? In both traces you have the LB enabled? Or did you mean netfilter?
> 
> Iptables was disabled in the second case
> 
> I see now :). What are ab's conclusions when you run those tests? How
> many dropped connections, how many packets ... and so one. 
> 
> With iptables enabled the IP address stops responding on the test client
> server (zeus)

How long does this usually take?

>> Could you send along the ethtool $intf and ethtool -k $intf output?
> 
> [root at loadb1 ha.d]# ethtool eth0
> Settings for eth0:
>         Supported ports: [ MII ]
>         Supported link modes:   10baseT/Half 10baseT/Full 
>                                 100baseT/Half 100baseT/Full 
>                                 1000baseT/Half 1000baseT/Full 
>         Supports auto-negotiation: Yes
>         Advertised link modes:  10baseT/Half 10baseT/Full 
>                                 100baseT/Half 100baseT/Full 
>                                 1000baseT/Half 1000baseT/Full 
>         Advertised auto-negotiation: Yes
>         Speed: 100Mb/s
>         Duplex: Full
>         Port: Twisted Pair
>         PHYAD: 1
>         Transceiver: internal
>         Auto-negotiation: on
>         Supports Wake-on: g
>         Wake-on: d
>         Current message level: 0x000000ff (255)
>         Link detected: yes
> [root at loadb1 ha.d]# ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: off

Ok, no TSO.

>>  Please show cat /proc/interrupts and /proc/slabinfo
> 
> [root at loadb1 ha.d]# cat /proc/interrupts 
>            CPU0       CPU1       
>   0:   26109146   26134774    IO-APIC-edge  timer
>   4:     822532     821228    IO-APIC-edge  serial
>   8:          0          1    IO-APIC-edge  rtc
>   9:          0          0   IO-APIC-level  acpi
>  10:          0          2   IO-APIC-level  ehci_hcd, ohci_hcd, ohci_hcd
>  11:          0          0   IO-APIC-level  libata
>  14:     234713     234473    IO-APIC-edge  ide0
> 177:      23675      32438   IO-APIC-level  3ware Storage Controller
> 185:          0    1001131   IO-APIC-level  eth0
> 193:     309782        257   IO-APIC-level  eth1
> NMI:          0          0 
> LOC:   52246993   52246992 
> ERR:          0
> MIS:          0
> [root at loadb1 ha.d]# cat /proc/slabinfo 
> slabinfo - version: 2.0
> # name            <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata
> <active_slabs> <num_slabs> <sharedavail>
> ip_vs_conn             2     20    192   20    1 : tunables  120   60    8 :
> slabdata      1      1      0
> fib6_nodes             7    119     32  119    1 : tunables  120   60    8 :
> slabdata      1      1      0
> ip6_dst_cache          7     15    256   15    1 : tunables  120   60    8 :
> slabdata      1      1      0
> ndisc_cache            1     20    192   20    1 : tunables  120   60    8 :
> slabdata      1      1      0
> rawv6_sock             4     11    704   11    2 : tunables   54   27    8 :
> slabdata      1      1      0
> udpv6_sock             1     11    704   11    2 : tunables   54   27    8 :
> slabdata      1      1      0
> tcpv6_sock             2      3   1216    3    1 : tunables   24   12    8 :
> slabdata      1      1      0
> ip_fib_alias          16    226     16  226    1 : tunables  120   60    8 :
> slabdata      1      1      0
> ip_fib_hash           16    119     32  119    1 : tunables  120   60    8 :

Nothing special here.

>> Care to show your lighttpd configuration?
> 
> Very basic... The site we are preping for it mostly static too, with fastcgi
> for PHP.  I'll show the info that's important for performance:
> 
> server.max-fds = 2048
> server.max-keep-alive-requests = 32
> server.max-keep-alive-idle=5

Fine.

>>> If it's something with the connection tracking overflow you'll see it in
>>> your kernel logs.
>> No message on the LB when this happens.
> 
>> Could you share the socket states on the RS during both runs? Also the
>> ipvsadm -L -n -c output in the middle of the run?
> 
> With iptables enabled.
> 
> [root at loadb1 ha.d]# ipvsadm -L -n -c | wc 
>   27724  166341 2162413
> 
> [root at loadb1 ha.d]# ipvsadm -L -n -c | grep "ESTABLISHED" | wc
>   27719  166314 2162082

I'm interested also in the socket states on the RS and also compared to 
not using netfilter.

> I'm not sure the firewall is the issue and could be the client machine. As I
> just ran ab with iptables disabled and it still gave me the error.  Iptables
> is enabled on the client test machine.

Well, if you disable netfilter (iptables service) completely on all 
systems, and it still exhibits the problems, we need to debug this further.

Best regards,
Roberto Nibali, ratz
-- 
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc


Search lvs-users Archives
Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort

More information about the lvs-users mailing list