[lvs-users] source hashing some times land on wrong server (with FTP)

Julian Anastasov ja at ssi.bg
Mon Nov 4 20:21:39 GMT 2019


	Hello,

On Fri, 1 Nov 2019, Phillip Moore wrote:

> Hello!
> We have FTP setup with on its own VIP and just map all ports (:0) and use
> source hashing. Sometimes when the FTP client opens the data channel it
> will land on the wrong real server causing a reset. I stress sometimes
> because mostly FTP seems to work but we do see this behavior of requests
> landing on the wrong server.
> 
> FTP client makes connection to VIP:0 on ftp port, is asked to open data
> channel on VIP:0 on alternate port. FTP client sends SYN packet but that
> packet doesn't land on the correct real FTP server, so connection is
> reset.  That SYN packet likely came through a different IPVS server but
> should have sync connection state by this time.
> 
> Example of our config:
> 
> -A -t x.y.z.220:0 -s sh -p 600 -b sh-fallback
> -a -t x.y.z.220:0 -r a.b.c.4:0 -i -w 1
> -a -t x.y.z.220:0 -r a.b.c.5:0 -i -w 1
> -a -t x.y.z.220:0 -r a.b.c.6:0 -i -w 1
> -a -t x.y.z.220:0 -r a.b.c.7:0 -i -w 1
> 
> 3.10.0-1062.1.1.el7.x86_64
> 
> We have this config running on multiple active IPVS servers all running
> active/backup sync processes .
> 
> We've also tried a non 1 weight (1000) to see if it was the overload logic
> kicking in and sending requests to alt server, but that did not seem to be
> it.
> 
> Is there any reason why subsequent connections from the same source IP
> would land on a different server?

	Try to set the backup_only sysctl var to 1 on all directors
that are backup servers and that can be used also as real servers.
The flag can stay to 1 even while director runs as master. For the
rare setups that run both master and backup function at the same time,
this flag should not be used.

	As result, when backup function is active any traffic received on 
backup servers will be delivered locally, it will not be rescheduled to 
other real servers. The backup_only flag is useful for DR/TUN setups to 
avoid packet loops or as in your case to avoid rescheduling to different 
real server. Why this happens? May be because sync messages are delayed,
sometimes up to 2 seconds.

	Also, make sure the real servers are listed (added) in same order 
in all directors that use SH scheduler. If not, the scheduling can select
different real server in both directors. There is no such requirement for
the MH scheduler which is more advanced but it is added in more recent
kernels (4.18+).

Regards

--
Julian Anastasov <ja at ssi.bg>



More information about the lvs-users mailing list