[lvs-users] IPVS/NAT - no connection after real server down
graeme at graemef.net
Fri Sep 5 14:15:47 BST 2008
On Fri, 2008-09-05 at 14:47 +0200, Pitscheider, Oswald wrote:
> I’ve tried the LVS with this changes having a little succeed, but there is still the problem that if I remove a real server, requests to the server are responded very slowly.
> From them moment, when the real server is removed from the pool, some requests have to wait seconds for an answer.
> After a minute, the LVS works as it should.
This is fairly predictable, from your configuration and from the way TCP
Each realserver is checked every 20 seconds (delay_loop 20). If you stop
Apache just as the check is done successfully, requests will stall for
20 seconds until the next check (because the server isn't responding).
If a request arrives fractionally after the successful check, the server
isn't responding, then the client will retry at the following intervals:
-0.002 RS1 Check succeeds
-0.001 RS1 Apache stopped
0.000 Request arrives at RS1
3.000 retry 1 to RS1
9.000 retry 2 to RS1
19.998 RS1 Check fails
keepalived removes RS1 from pool
21.000 retry 3 sent to RS2
Note however that it may take a short period for keepalived to do the
server removal, which may overlap with retry 3 - and the next delay to
retry 4 is another 24 seconds (3, 6, 12, 24 and so on) which takes you
towards 45 seconds altogether.
> I’ve tested the LVS using jmeter with 25 threats.
And depending on the way jmeter is configured, alongside your webserver
config, this will mean a minimum of 20 seconds (and likely much longer)
delay between you dropping the webserver and the clients recovering.
It is perfectly permissible to bring down the delay_loop as much as you
or your app server can tolerate. For fast failover you need a short
delay. I would argue that for most web clients, 20 seconds is perfectly
acceptable but that can depend entirely on what you're trying to
Try "delay_loop 1" and see what you get. What you will get, possibly,
are a lot of log entries - but you should get very fast recovery.
More information about the lvs-users