[lvs-users] FTP data port connection not closing?
Owain at 4ColourDigital.com
Wed Aug 23 13:06:43 BST 2017
I've set up LVS-DR - via ldirectord, fired up by heartbeat - for my
cluster and I've got it serving up HTTP / HTTPS just fine. So I moved
onto FTP and it seems to work, except that when uploading files, it gets
stuck at 100% on FileZilla (but also the same on other FTP clients and
via the command line), and then timeouts.
The interesting thing is that if I abort and then check the remote
directory, the file is actually there.
Either in its entirety or it's a few bytes / kilobytes short (and,
often, that appears to be a power of two short - that is, if the file
was 36,000 bytes then the file on the server is 32,769 bytes). This
looks to me like a disk caching thing - as the connection isn't closed,
the file isn't closed and it's not always fully writing back all the data.
The file on the server is also, up to the point where it cuts off,
perfectly correct when I test with a text file to be able to see that,
yes, the data is being transferred and transferred correctly.
With FTP, if I understand things correctly, the data connection itself
is used as end-of-file. When the file has been transferred, the client
just closes the connection and this signals that all the data has been
transferred. And, from what I see happening, this is what's seemingly
not happening here.
The data connection is remaining open, but no more data is being sent
and the server sits just there waiting for an EOF that never comes.
Until it times out. But, other than that, everything else appears to be
functioning correctly - including the file transfers themselves, as data
is reaching the server because the files are there.
In ldirectord, I've got this for the FTP:
--- >8 ---
service = ftp
scheduler = wlc
protocol = fwm
checktype = connect
real = [hostname of FTP server] gate
--- 8< ---
I'm using firewall marks, simply because passive FTP uses a lot of
ports. Currently, I've only got a single FTP server in the cluster, so
really all LVS is doing is passing the packets on (I might well up the
number of FTP servers in the cluster later, but currently I'm focusing
on just getting it working first, then I'll expand later). This is
LVS-DR, so the realserver replies directly to the client. But the client
only knows the external VIP of the cluster and sends replies back, so
the passive ports are firewall marked to be sent to the FTP server too.
I'm adding the firewall marks to the packets by adding these rules to
"before.rules" in UFW:
--- >8 ---
:PREROUTING - [0:0]
-A PREROUTING -p tcp -d 192.168.0.99/32 --dport 21 -j MARK --set-mark 21
-A PREROUTING -p tcp -d 192.168.0.99/32 --dport 20000:21000 -j MARK
--- 8< ---
Where "192.168.0.99" is the VIP of the cluster.
I've configured my FTP server to report the external IP address of the
cluster and to restrict itself to the passive ports 20000-21000. I know
this works correctly, as I can see that the "entering PASV
(x,x,x,x,p,p)" response has the right IP and is always within the
passive port range.
Indeed, everything otherwise seems to function correctly. The data
connection is being made and the files are being sent (and in terms of
downloads, directory listing, deleting files, TLS authentication and all
the rest of it, this all works 100%). But as the data connection appears
not to close, then it just gets stuck at 100% until it times out with an
I've also, of course, tried connecting directly to the FTP server on the
LAN - without LVS being involved - and everything works 100%.
What seems to be happening is that LVS isn't passing onto the realserver
the fact that the data connection has closed. But with FTP, it is
necessary for this to be sent, as that's how EOF is signalled to the server.
Please help, if you can. Everything else with the server is good, so
it's just this little glitch holding everything up.
More information about the lvs-users