[lvs-users] FTP data port connection not closing?

Owain Jones Owain at 4ColourDigital.com
Wed Aug 23 13:06:43 BST 2017


I've set up LVS-DR - via ldirectord, fired up by heartbeat - for my 
cluster and I've got it serving up HTTP / HTTPS just fine. So I moved 
onto FTP and it seems to work, except that when uploading files, it gets 
stuck at 100% on FileZilla (but also the same on other FTP clients and 
via the command line), and then timeouts.

The interesting thing is that if I abort and then check the remote 
directory, the file is actually there.

Either in its entirety or it's a few bytes / kilobytes short (and, 
often, that appears to be a power of two short - that is, if the file 
was 36,000 bytes then the file on the server is 32,769 bytes). This 
looks to me like a disk caching thing - as the connection isn't closed, 
the file isn't closed and it's not always fully writing back all the data.

The file on the server is also, up to the point where it cuts off, 
perfectly correct when I test with a text file to be able to see that, 
yes, the data is being transferred and transferred correctly.

With FTP, if I understand things correctly, the data connection itself 
is used as end-of-file. When the file has been transferred, the client 
just closes the connection and this signals that all the data has been 
transferred. And, from what I see happening, this is what's seemingly 
not happening here.

The data connection is remaining open, but no more data is being sent 
and the server sits just there waiting for an EOF that never comes. 
Until it times out. But, other than that, everything else appears to be 
functioning correctly - including the file transfers themselves, as data 
is reaching the server because the files are there.

In ldirectord, I've got this for the FTP:

--- >8 ---

     service = ftp
     scheduler = wlc
     protocol = fwm
     checktype = connect
     real = [hostname of FTP server] gate

--- 8< ---

I'm using firewall marks, simply because passive FTP uses a lot of 
ports. Currently, I've only got a single FTP server in the cluster, so 
really all LVS is doing is passing the packets on (I might well up the 
number of FTP servers in the cluster later, but currently I'm focusing 
on just getting it working first, then I'll expand later). This is 
LVS-DR, so the realserver replies directly to the client. But the client 
only knows the external VIP of the cluster and sends replies back, so 
the passive ports are firewall marked to be sent to the FTP server too.

I'm adding the firewall marks to the packets by adding these rules to 
"before.rules" in UFW:

--- >8 ---


-A PREROUTING -p tcp -d  --dport 21 -j MARK --set-mark 21
-A PREROUTING -p tcp -d  --dport 20000:21000 -j MARK 
--set-mark 21


--- 8< ---

Where "" is the VIP of the cluster.

I've configured my FTP server to report the external IP address of the 
cluster and to restrict itself to the passive ports 20000-21000. I know 
this works correctly, as I can see that the "entering PASV 
(x,x,x,x,p,p)" response has the right IP and is always within the 
passive port range.

Indeed, everything otherwise seems to function correctly. The data 
connection is being made and the files are being sent (and in terms of 
downloads, directory listing, deleting files, TLS authentication and all 
the rest of it, this all works 100%). But as the data connection appears 
not to close, then it just gets stuck at 100% until it times out with an 
error message.

I've also, of course, tried connecting directly to the FTP server on the 
LAN - without LVS being involved - and everything works 100%.

What seems to be happening is that LVS isn't passing onto the realserver 
the fact that the data connection has closed. But with FTP, it is 
necessary for this to be sent, as that's how EOF is signalled to the server.

Please help, if you can. Everything else with the server is good, so 
it's just this little glitch holding everything up.


More information about the lvs-users mailing list