[lvs-users] FTP data port connection not closing?

Julian Anastasov ja at ssi.bg
Wed Aug 23 16:32:48 BST 2017


	Hello,

On Wed, 23 Aug 2017, Owain Jones wrote:

> Hi,
> 
> I've set up LVS-DR - via ldirectord, fired up by heartbeat - for my 
> cluster and I've got it serving up HTTP / HTTPS just fine. So I moved 
> onto FTP and it seems to work, except that when uploading files, it gets 
> stuck at 100% on FileZilla (but also the same on other FTP clients and 
> via the command line), and then timeouts.
> 
> The interesting thing is that if I abort and then check the remote 
> directory, the file is actually there.
> 
> Either in its entirety or it's a few bytes / kilobytes short (and, 
> often, that appears to be a power of two short - that is, if the file 
> was 36,000 bytes then the file on the server is 32,769 bytes). This 
> looks to me like a disk caching thing - as the connection isn't closed, 
> the file isn't closed and it's not always fully writing back all the data.
> 
> The file on the server is also, up to the point where it cuts off, 
> perfectly correct when I test with a text file to be able to see that, 
> yes, the data is being transferred and transferred correctly.
> 
> With FTP, if I understand things correctly, the data connection itself 
> is used as end-of-file. When the file has been transferred, the client 
> just closes the connection and this signals that all the data has been 
> transferred. And, from what I see happening, this is what's seemingly 
> not happening here.
> 
> The data connection is remaining open, but no more data is being sent 
> and the server sits just there waiting for an EOF that never comes. 
> Until it times out. But, other than that, everything else appears to be 
> functioning correctly - including the file transfers themselves, as data 
> is reaching the server because the files are there.
> 
> In ldirectord, I've got this for the FTP:
> 
> --- >8 ---
> 
> virtual=21
>      service = ftp
>      scheduler = wlc
>      protocol = fwm
>      checktype = connect
>      real = [hostname of FTP server] gate
> 
> --- 8< ---
> 
> I'm using firewall marks, simply because passive FTP uses a lot of 
> ports. Currently, I've only got a single FTP server in the cluster, so 
> really all LVS is doing is passing the packets on (I might well up the 
> number of FTP servers in the cluster later, but currently I'm focusing 
> on just getting it working first, then I'll expand later). This is 

	When you add more real servers you will need to use -p to
enable persistence, ipvsadm man page explains this for FTP.

> LVS-DR, so the realserver replies directly to the client. But the client 
> only knows the external VIP of the cluster and sends replies back, so 
> the passive ports are firewall marked to be sent to the FTP server too.
> 
> I'm adding the firewall marks to the packets by adding these rules to 
> "before.rules" in UFW:
> 
> --- >8 ---
> 
> *mangle
> :PREROUTING - [0:0]
> 
> -A PREROUTING -p tcp -d 192.168.0.99/32  --dport 21 -j MARK --set-mark 21
> -A PREROUTING -p tcp -d 192.168.0.99/32  --dport 20000:21000 -j MARK 
> --set-mark 21
> 
> COMMIT
> 
> --- 8< ---
> 
> Where "192.168.0.99" is the VIP of the cluster.
> 
> I've configured my FTP server to report the external IP address of the 
> cluster and to restrict itself to the passive ports 20000-21000. I know 
> this works correctly, as I can see that the "entering PASV 
> (x,x,x,x,p,p)" response has the right IP and is always within the 
> passive port range.
> 
> Indeed, everything otherwise seems to function correctly. The data 
> connection is being made and the files are being sent (and in terms of 
> downloads, directory listing, deleting files, TLS authentication and all 
> the rest of it, this all works 100%). But as the data connection appears 
> not to close, then it just gets stuck at 100% until it times out with an 
> error message.
> 
> I've also, of course, tried connecting directly to the FTP server on the 
> LAN - without LVS being involved - and everything works 100%.
> 
> What seems to be happening is that LVS isn't passing onto the realserver 
> the fact that the data connection has closed. But with FTP, it is 
> necessary for this to be sent, as that's how EOF is signalled to the server.

	What shows 'ipvsadm -Lnc' when connection stucks ? What is
shown for the command (:21) and data (20xxx data port) connection?

	IPVS can see only the packets from client, so any
FIN bit we see is the client's wish to half-close the TCP
connection. While there is existing connection entry, IPVS
should not stop the traffic. But if the transfer is very long
and short TCP EST timeout is used (ipvsadm --set TCP ...)
the command connection can expire. If persistence is used
this can not happen because the data connections bump a
reference count that keeps the command connection.

	In any case, tcpdump -lnnnv 'host CLIENT_IP' output
on director would be useful, even if only for the last packets
before connections stucks. You can run it both on
incoming (from client) and outgoing (to real server)
interface, so that we can see if some packets are not
forwarded.

	Also, what is the kernel version?: uname -a

> Please help, if you can. Everything else with the server is good, so 
> it's just this little glitch holding everything up.
> 
> Thanks.

Regards

--
Julian Anastasov <ja at ssi.bg>



More information about the lvs-users mailing list