LVS breaking ip_nat_ftp (??)
Antonio Forster
aforster at gmail.com
Wed Nov 8 17:08:14 GMT 2006
On 11/8/06, Graeme Fowler <graeme at graemef.net> wrote:
> Antonio Forster wrote:
> > We had load ip_conntrack_ftp in both situations, with modular and
> > static kernel. thanks for the comment anyway!
>
> Are you also load-balancing *inbound* FTP sessions in this LVS?
Not at all. The only FTP sessions are initiated in the servers in th
LVS cluster to environments out of the cluster.
> Humour me for a moment. On the face of it, from here, it seems highly
> likely that the N:1 SNAT rule for outbound initiated connections is
> incorrect - not that I'm accusing you of anything here, I am trying to
> simplify the conditions.
>
The SNAT rules are the following:
iptables -t nat -I POSTROUTING -o eth0 -s inst11 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst12 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst13 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst14 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst21 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst22 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst23 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst24 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst31 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst32 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst33 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst34 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst41 -j SNAT --to-source VIP4
iptables -t nat -I POSTROUTING -o eth0 -s inst42 -j SNAT --to-source VIP4
iptables -t nat -I POSTROUTING -o eth0 -s inst43 -j SNAT --to-source VIP4
iptables -t nat -I POSTROUTING -o eth0 -s inst44 -j SNAT --to-source VIP4
> Can you do a sequence of tests? Below, the word "active" indicates that
> *either*:
>
> A: The "active" server has all services up, the others are down, the LVS
> remains configured on the director for all four; or
>
> B: The "active" server is the *only* server configured for LVS service
> on the director.
>
> 1. Attempt an FTP connection from server1 (each time) with server1,
> server2, server3, server4 active in the LVS on their own (four tests).
>
> 2. Do the same sequence but with the FTP connection coming from server2,
> server3, server4 in turn (with the other servers active in turn as in 1).
>
> 3. Test from server(1,2,3,4) with pairs of servers active.
>
> 4. Test from server(1,2,3,4) with triplets of servers active.
>
> 5. Finally, test from server(1,2,3,4) with all servers active.
>
> This way, although a bit long-winded, should at least throw some light
> on the problem - bear in mind that we can only see what you're telling
> us, so any additional info will help!
We have conducted all the tests you mentioned, and we found out that
if more than one instance is up and the LVS health checkers are
monitoring them and seeing they are up, the outbound FTP fails.
The strange part is:
- during the test, there were one virtual server group with only one
active instance, and that one had about 20 sessions. when I activated
another instance on the same virtual server, the FTP worked fine until
the amount of connections on the second instance reached the same
amount of connections the first instance had. At that time, the FTP
stopped working again.
With this behavior I thought the problem was a result of the load
balancing itself. Since the scheduler in use is wlc, until LVS had to
start balancing again between the two instances, it was working. When
considering this, I decided to change the keepalived configs to
include persistence for the sessions, and after that, it seems to be
working in all situations..
does it make sense? I'll go on with further testing anyway.
Thanks and best regards,
Antonio
Search lvs-users Archives
More information about the lvs-users
mailing list