[lvs-users] ldirectord does not transfer connections when a real server dies

Anders Henke anders.henke at 1und1.de
Tue Apr 30 13:20:29 BST 2013


On 30.04.2013, Konstantin Boyanov wrote:
> 1. I open some connections from a client browser (IE 8) to the sites that
> are hosted on the real servers
> 2. I cange the weight of the real server which server the above connections
> to 0 and leave only the other real server alive
> 3. I reload the pages to regenerate the connections
> 
> What I am seeing with ipvsadm -Ln is that the connections are still on the
> "dead" server. I have to wait up to one minute (I suppose some TCP timeout
> from the browser-side) for them to transfer to the "living" server. And If
> in this one minute I continue pressing the reload button the connections
> stay at the "dead" server and their TCP timeout counter gets restarted.
> 
> So my question is: Is there a way to tell the load balancer in NAT mode to
> terminate / redirect existing connections to a dead server *immediately*
> (or close to immediately)?

The man page for ipvsadm (the --weight parameter) helps understanding this :)

To remove a dead server "immediately", you need to remove it from the ipvs table 
(ipvsadm --delete-server). This also breaks any open tcp connections,
probably resulting in "connection refused"-error messages or "broken images" 
on the client.

A weight of zero essentially means "don't put new connections on this realserver,
but continue serving existing connections". This is something you're usually 
using for maintenance: let the server fulfill any pending requests, and after 
a few minutes, all new connections usually have shifted to other realservers.

Of course, as long as the existing connection is being used, IPVS does
recognize your connection to be "in use". If you don't like this, you'll
actively have to tell your webserver to shut down any connections (e.g.
by shutting down IIS), forcing the client to reconnect or remove
your realserver from the IPVS table (which will result in timeouts for
any open connections, but ultimately the client will be forced to
reconnect as well).

> It seems to me a blunder that a reload on the client-side can make a
> connection become a "zombie", e.g. be bound to a dead real server although
> persistance is not used and the other server is ready and available.

The maintenance usecase is usually quite obvious to understand, and
that's the "zero weight" usecase. It's for "draining" connections: new
connections won't sent to this box, but any current connections will be
served (until they're being closed).
 
> The only thing that I found affecting this timeout is changing the
> keepAliveTimeout in the Windows machine running the IE8 which I use for the
> tests. When I cahnged it from the dafault value of 60 seconds to 30 seconds
> the connections could be transferred after 30 seconds. It seems to me very
> odd that a client setting can affect the operation of a network component
> as the load balancer.

The server does also have similiar options to configure:
-you can turn off HTTP Keepalive completely
-IIS 7 per default uses 2 minutes as a keepalive timeout; after this
 idle time, IIS will close the connection.

Of course, these options may have some (small) performance impact.

If you're turning off HTTP Keepalive completely, the browser is forced to setup a
new tcp connection, including the full tcp handshake for every object
to retrieve and the two packets for tearing down the connection.
If your website e.g. contains 40 images, you'll open and close at least 41 
connetions: the first for retrieving the html-part, another 40
connections for every single image (that's over-simplified, about every browser
does open a few connections in parallel - yet much fewer than those 40).

If the network latency between client and server is high, opening that
many connections one after another may result in a noticable delay.

By using HTTP keepalive, your browser will open one connection and ask
one image after the other via this connection, without setting up new
connections and tearing down connections, the website is somehow faster.
Additionally, long-living tcp connections do automatically increase
their tcp receive window size, which may give you a little bit of extra
performance as well.

If you decrease your IIS keepalive timeout lower than the limit of your
clients, your server is likely the one who'll be initiating closing of idle
tcp connections. If your server does this frequently, you'll end up
collecting lots of connections in the TIME_WAIT state (well, usually
less than without keepalive at all, but depending on the exact setup,
this may result in some trouble opening outgoing connections).

> And another thing - what is the colum named "Inactive Conenctions" in the
> output from ipvsadm used for? Which connections are considered inactive?
>
> And also in the output of ipvsadm i see a couple of connections with the
> state TIME_WAIT. What are these for?

TCP connections do have some kind of lifecycle and many states.

-setting up a connection works by sending a few packets back and forth,
 before the connection is known to be established and ready for data transfer.
 Search "tcp 3way handshake", if you'd like to know more about this.
-tearing down a connection also involves sending a few packets back and
 forth as well. The host initiating the closing tracks this connection as 
 "TIME_WAIT", the other host tracks this connection as "CLOSE_WAIT". In case some 
 late packets arrive for this connection, the host can handle them
 more appropriately (discard them, don't mistake them for being part of
 a subsequent connection). And in case the final FIN packet didn't
 arrive, the WAIT-state still gives that host the idea of temporarily
 not re-using this connection.
 Read: if you do have sockets in a TIME_WAIT-state, your host tries not to 
 reuse them immediately for something else, but waits a few minutes before 
 reusing it. If you're closing too many connections and try to open new 
 outgoing connections, you may be out of usable sockets. That problem does
 rarely occur on systems with either poor software or some poor connection
 access pattern.

In IPVS, only established connections are counted as "active", while "inactive"
refers to connections in any other state. A few minutes after TIME_WAIT, the 
connection will become "closed", its state will be forgotten and won't show 
up both in ipvsadm nor netstat anymore.


Anders
-- 
1&1 Internet AG              Expert Systems Architect (IT Operations)
Brauerstrasse 50             v://49.721.91374.0
D-76135 Karlsruhe            f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, 
Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler,
Dr. Oliver Mauss, Jan Oetjen, Martin Witt, Christian Würst
Aufsichtsratsvorsitzender: Michael Scheeren



More information about the lvs-users mailing list