[lvs-users] Packet loops in two-node set-up

Jan-Aage Frydenbø-Bruvoll jan at frydenbo-bruvoll.com
Sun Dec 2 20:58:41 GMT 2012


Dear List,

I have been struggling badly with spurious packet loops in one of my
two-node set-ups, and I am reaching out to the list for help, as I have not
been able to solve this even after weeks of refining the config. I am also
unsure whether I fully grasp what exactly is needed for this to work
correctly.

My set-up:

Two LXC containers, each with an eth0 and eth1, where eth0 is outwards
facing.

I use keepalived, which has been set up to respond to fwmark 1. I also let
keepalived handle the transition of VIP. For the slave node, I set up the
VIP address as an alias on lo.

My firewall rules in the mangle table look like this (structurally
identical on both nodes):

-N IPVS
-N clear_fwmark
-A PREROUTING -j IPVS
-A IPVS -m mac --mac-source <eth0 on other node> -j clear_fwmark
-A IPVS -m mac --mac-source <eth1 on other node> -j clear_fwmark
-A IPVS -d <VIP> -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-xmark
0x1/0xffffffff
-A clear_fwmark -j MARK --set-xmark 0x0/0xffffffff
-A clear_fwmark -j ACCEPT

The intention here is to set the fwmark only in those cases where the
traffic comes in directly from the outside world. My logic is that if the
sender MAC is one of the interfaces on the other node, I should not set the
fwmark, which then should bypass IPVS.

The extra setting of fwmark 0 is due to my desperation to get this to work
and is entirely my own (and possibly pointless) idea.

The odd thing is that something seems to trigger a packet loop, and this
something only releases a packet loop of ~ 20Mbps. THe loop will eventually
die off on its own accord (i.e. after a few days), and the traffic will
never escalate out of control.

Another thing I find strange is that I would expect this configuration to
trigger the "local" mode in IPVS, which it does not:

proxy01# ipvsadm -Ln
...
FWM  1 wrr
  -> proxy01:80             Route   100    154        266
  -> proxy02:80             Route   100    139        273

Has anyone got any hints on this one? Could this be related to a particular
kernel version on the host (3.3.8 and 3.1.6, respectively)? Could it be LXC
related?

I would appreciate anybody's time and effort greatly. Thank you in advance
for your kind assistance.

Best regards
Jan Frydenbo-Bruvoll



More information about the lvs-users mailing list