[lvs-users] IPVS adding a 1s delay on connection establishment under moderately high number of TCP req/s

Julian Anastasov ja at ssi.bg
Wed May 23 20:33:22 BST 2018


On Wed, 23 May 2018, Toni Martí wrote:

> We detected a problem with IPVS module. Here's a quick summary of what
> triggers the problem:
> - IPVS has a hardcoded TIME_WAIT timeout of 120s
> - TCP/IP layer on the kernel has a hardcoded TIME_WAIT timeout of 60s
> - the connection rescheduling mechanism on IPVS acts by dropping the
> first received SYN message and letting the client retransmit the SYN
> message after (also hardcoded) RTO timeout, which in practice seems to
> be 1s
> Here is a scenario that triggers this problem:
> - we have some backend server balanced by IPVS
> - we have an external load balancer that balances requests from real
> clients to IPVS and does SNAT
> Here is what happens previous scenario under high throughput:
> - the external load balancer is behaving (due to SNAT) as a single
> origin IP for requests forwarded to IPVS
> - IPVS receives connections and forwards them to internal servers, but
> once served, on the IPVS connection table, connections remain in
> TIME_WAIT during 120s
> - the external load balancer has a TIME_WAIT of 60s, so after this
> time (or before if reusing connections in TIME_WAIT) it recycles the
> same ephemeral ports to send requests to IPVS
> - in-between those 60s (where the external LB starts reusing ports)
> and those 120s (where IPVS still has the connection in TIME_WAIT), the
> re-scheduling mechanism on IPVS has the result of adding a 1s delay
> (due to SYN-drop and the RTO timeout on the LB) to the connection
> establishment
> And this implies that when the external LB is under mid load, approx
> 250 req/s (calculated from [net.ipv4.ip_local_port_range on the LB]
> divided by [TW timeout on the LB = 60s]), the rescheduling mechanism
> at IPVS adds a delay of 1s to the establishment of TCP connections to
> internal servers.
> This 1s delay seems to be either caused by:
> - a mismatch between hardcoded TW-timeout on: IPVS = 120s, standard
> kernel TCP driver = 60s
> - the rescheduling algorithm on IPVS that forces the client (the LB)
> to wait an entire RTO before retransmitting the SYN packet
> I'm not telling that IPVS is either bad parametrized neither that the
> rescheduling algorithm is bad designed. You guys are awesome and have
> done a really great work with IPVS.
> The question is then: what can we do to avoid that 1s delay when
> rescheduling connections?

	There was recent discussion about this 1-second delay.
May be you will find the needed answers here:


	Basicly, you have 3 options:

- echo 0 > conn_reuse_mode: do not attempt to reschedule on
port reuse (new SYN hits unexpired conn), just use the same real
server. This can be bad, we do not select alive server if the
server used by old connection is not available anymore (weight=0
or removed).

	Next two options are if you do not want to
use the first option:

- echo 0 > conntrack: if you do not use rules to match
conntrack state for the IPVS packets. This is slowest,
conntracks are created and destroyed for every packet.

- use NOTRACK for IPVS packets: fastest, conntracks are
not created, less memory is used

> If you need it, I can elaborate on all the previous details, even
> provide a link of a github issue (for the docker project) with the
> details on how we arrived at sending an email to this list.
> Thanks in advance,
>     Toni


Julian Anastasov <ja at ssi.bg>

More information about the lvs-users mailing list