[lvs-users] SYN spiraling between master and slave IPVS balancers

Jan-Aage Frydenbø-Bruvoll jan at frydenbo-bruvoll.com
Wed Feb 6 10:03:54 GMT 2013


Dear Dmitry,

On 5 February 2013 16:45, Dmitry Akindinov <dimak at stalker.com> wrote:

> Hello,
>
> We have met a quite troublesome situation which causes an internal SYN
> storm.
>

So do we. We see intermittent loops of traffic, at various "levels", i.e.
15 Mbps if only one incoming packet gets duplicated, 30 Mbps if two are,
etc. These storms die down on their own in our case; we do not yet know why
they appear and why they disappear again.


> The simplified version of the configuration consists of 2 servers - A
> and B, both running Linux kernel 3.7.4-20.
>

Our kernels are currently 3.3.18 and 3.6.11, in a bridged LXC container
set-up. All Gentoo (hosts and containers).


> Both have the IPVS software enabled, A is acting as the active load
> balancer, B as a backup.
> Both servers act as real servers also.
>

Same here. The set-ups where we have pure load balancers do not exhibit
this problem at all.


> At some point, there is an incoming TCP connection from IPpair
> (address:port) I.
> The load balancer A decides to process it locally. Connection is
> established, and the balancer status is distributed to server B via
> syncing broadcast.
>
> The client closes connection, and again the status is updated on B via
> the broadcast - the connection is now in the "TCP_WAIT" state.
>
> Pretty soon (within 10 seconds) the client opens the new TCP connection
> using the same IP pair I.
> It is not a good TCP practice, but nevertheless, some clients work this
> way.
>
> This time the load balancer A decides that the connection is to be
> handled on the server B (persistence is switched off).
> The SYN packet is relayed to the server B, which finds an existing
> routing record for that pair I.
> And that record (in the CLOSE state) - points to the server A, and the
> SYN packet is relayed there.
>
> The server A processes it again, directs it to the server B again, and
> the loop spirals, since the server B does not have the new connection
> table element I synced.
>

Incredibly interesting information. Have you tweaked any TCP settings on
the servers at all (in desperation perhaps, settings that are now forgotten
but still active)?

Based on your description my idea would be that a pair of timeouts clash -
i.e. that IPVS forgets a connection that the routing layer keeps hold of,
and that maybe these timeouts need to be in sync (or ensured that IPVS
keeps a connection longer than the routing layer). To be honest, I have no
idea what I'm talking about here, though.

We'd be happy to share information and findings so that we can get rid of
this problem - it is annoying at best.

Best regards
Jan Frydenbo-Bruvoll



More information about the lvs-users mailing list