Problems getting LVS to work

Mark Wadham mark.wadham at areti.net
Thu Mar 29 15:57:34 BST 2007


Roberto Nibali wrote:
> Hi Mark,
>
>>> Excellent problem report!
>>>
>> *takes a bow*
>
>
> I here by dub thee once ... I dub thee twice ... I dub thee Sir LVS 
> Bug Reporter, you may rise and go forth. Will you accept from Us this 
> honor,
> and will you swear fealty to this, Our order of LVS?
>
Yes yes, thanks :D
>>>> # ipvsadm --list -n
>>>> IP Virtual Server version 1.2.1 (size=4096)
>>>> Prot LocalAddress:Port Scheduler Flags
>>>>  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>>>> TCP  100.1.1.2:25 wlc
>>>>  -> 120.1.1.1:25            Tunnel  1      0          0
>>>>  -> 120.1.1.2:25            Tunnel  1      0          0
>>>>
>>>> iptables has no rules and is default-to-accept.  There is no 
>>>> firewall in front of the box.
>>>>
>>>> Mail server 1 (120.1.1.1)
>>>> =================
>>>>
>>>> relevant iptables rules:
>>>>
>>>> $IPTABLES -A INPUT -i eth0 -s 100.1.1.2 -p ipencap -j ACCEPT
>>>> $IPTABLES -A INPUT -i tunl0 -p tcp --dport smtp -j ACCEPT
>>>
>>> Why do you need those rules if you're not having any netfilter rules 
>>> and a ACCEPT policy?
>>>
>> The mailservers _do_ have firewall rules, its just the new load 
>> balancer that does not.  However, I don't think this is a firewall 
>> issue as dropped packets still show up in tcpdump, and also I am able 
>> to telnet directly to port 25 on both mailservers from the new 
>> (broken) load balancer.
>
> Not necessarily but this is hopefully not hitting you. Depending on 
> the kernel, netfilter in the PREROUTING table handling could drop the 
> skb before tcpdump would get a skb->clone() of it.
>
>>> I'm a bit confused by your obfuscation technique :), what's the 
>>> designation for the servers regarding the obfuscated IP ranges in 
>>> 100.x.x.x, the 120.x.x.x, the 130.x.x.x and the 140.x.x.x?
>>>
>>> 140: your test machine
>>> 130: working LVS tunnel
>>> 120: RS (mail server)
>>> 100: new (non-functional) LVS tunnel
>>>
>>> Is my observation correct?
>>>
>> Yes, sorry for the obfuscation - I was all for just pasting the real 
>> IPs but my manager refused to let me ;)
>
> That's very noble of him.
>
>>> So this works perfectly, as shown above, which actually indicates 
>>> that you have at one point got LVS to work. Sidenote: Your LVS seems 
>>> to be a bit out of sync regarding time; otherwise your trace looks odd.
>>>
>> Yes, it was actually someone else who got it working before, and he 
>> is far too busy to assist me with the new one :)
>
> This is the part where your manager should probably call him back :).
>
It was actually the manager himself who set up the first one :)
>>>> Now, if I try the same thing but telnet to 100.1.1.2:25 (the new 
>>>> load balancer), the connection times out.  tcpdumps show:
>>>
>>> Care to show the whole ipvsadm -L -n output? Or is the one above 
>>> representative enough to display the problem?
>>>
>> Didn't I paste this above?  --list is the same as -L I believe, at 
>> least the output is no different..
>
> Sure, but there was no indication to which state of your test conducts 
> your quoted output pertained to. When you say "the new load balancer" 
> above, you do not mean a physically different machine to the "old load 
> balancer", do you?
>
There are two load balancers, the 'old' one which works and the 'new' 
one which doesn't.  Here is the ipvsadm output for the new, broken load 
balancer:

# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  100.1.1.2:25 wlc
  -> 120.1.1.1:25            Tunnel  1      0          0
  -> 120.1.1.2:25            Tunnel  1      0          0

>>>> Mar 29 11:01:48 dev1 kernel: IPVS: lookup/in TCP 
>>>> 140.1.1.1:4042->100.1.1.2:25 not hit
>>>> Mar 29 11:01:48 dev1 kernel: IPVS: lookup service: fwm 0 TCP 
>>>> 100.1.1.2:25 hit
>>>
>>> Now this is very very weird. The normal TCP service lookup did not 
>>> succeed, although it should have, but the FWM TCP service lookup 
>>> did. Are you sure that:
>>>
>>> a) You have cleanly shutdown (rmmod ip_vs if necessary) IPVS between
>>>    the functional and the non-functional test conduct?
>> ipvs is compiled statically into the kernel, so how would I shut it 
>> down?  I had no idea it was necessary to shut it down and bring it 
>> back up, although I have rebooted the server a couple of times which 
>> I am sure would accomplish the same effect.
>
> Absolutely. The point is that the template entries are not flushed 
> when you simply remove the destination servers from the kernel, only 
> detached.
>
>>> b) You have no iptables or iproute2 rules indicating firewall marks?
>>
>> # iptables --list
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>
> That's not all :). You've only shown the filter table, but I'm also 
> interested in the mangle table.
>
# iptables -t mangle --list
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

# iptables -t nat --list
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

>> # iproute2
>> bash: iproute2: command not found
>
> It's the ip command output from the iproute2 framework I was looking for.
>
> This is the successor to ifconfig and route and netstat and whatnot. 
> The Linux world decided at one point in its history (around 1999) that 
> ifconfig/route/other networking setup tools are not appropriate 
> anymore and replaced them with the iproute2 framework. Unfortunately 
> the guy who started all this is a bloody genius and as such did two 
> things: a) completely forgot to document it, b) never told anyone 
> outside the kernel community about this, for years. So, if you find 
> time, invoke "man ip" on a recent enough Linux distribution of your 
> choice.
>
LOL
>> I built this server myself and never did anything with iproute2..  so 
>> I'm guessing the answer is no.  Although I do believe Debian is evil 
>> and so I guess it could have possibly done this itself behind my back.
>
> Debian people hopefully do not have evil intentions, however could 
> pass along the output of:
>
> ip rule show
> ip route show
> ip link show
> ip addr show
> grep -r . /proc/sys/net/ipv4/conf/*
>
# ip rule show
0:      from all lookup 255
32766:  from all lookup main
32767:  from all lookup default
# ip route show
100.1.1.0/24 dev eth0  proto kernel  scope link  src 100.1.1.1
default via 85.158.56.1 dev eth0
# ip link show
1: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
2: plip0: <POINTOPOINT,NOARP> mtu 1500 qdisc noop qlen 10
    link/ether fc:fc:fc:fc:fc:fc peer ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:04:76:16:12:a5 brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:b0:d0:68:7f:2b brd ff:ff:ff:ff:ff:ff
5: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
6: shaper0: <> mtu 1500 qdisc noop qlen 10
    link/ether
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop
    link/ether b6:e6:25:ed:c6:2d brd ff:ff:ff:ff:ff:ff
8: eql: <MASTER> mtu 576 qdisc noop qlen 5
    link/slip
9: teql0: <NOARP> mtu 1500 qdisc noop qlen 100
    link/void
10: tunl0: <NOARP> mtu 1480 qdisc noop
    link/ipip 0.0.0.0 brd 0.0.0.0
11: gre0: <NOARP> mtu 1476 qdisc noop
    link/gre 0.0.0.0 brd 0.0.0.0
# ip addr show
1: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
2: plip0: <POINTOPOINT,NOARP> mtu 1500 qdisc noop qlen 10
    link/ether fc:fc:fc:fc:fc:fc peer ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:04:76:16:12:a5 brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:b0:d0:68:7f:2b brd ff:ff:ff:ff:ff:ff
    inet 100.1.1.1/24 brd 100.1.1.255 scope global eth0
    inet 100.1.1.2/24 brd 100.1.1.255 scope global secondary eth0:0
5: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
6: shaper0: <> mtu 1500 qdisc noop qlen 10
    link/ether
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop
    link/ether b6:e6:25:ed:c6:2d brd ff:ff:ff:ff:ff:ff
8: eql: <MASTER> mtu 576 qdisc noop qlen 5
    link/slip
9: teql0: <NOARP> mtu 1500 qdisc noop qlen 100
    link/void
10: tunl0: <NOARP> mtu 1480 qdisc noop
    link/ipip 0.0.0.0 brd 0.0.0.0
11: gre0: <NOARP> mtu 1476 qdisc noop
    link/gre 0.0.0.0 brd 0.0.0.0
# grep -r . /proc/sys/net/ipv4/conf/*
/proc/sys/net/ipv4/conf/all/promote_secondaries:0
/proc/sys/net/ipv4/conf/all/force_igmp_version:0
/proc/sys/net/ipv4/conf/all/disable_policy:0
/proc/sys/net/ipv4/conf/all/disable_xfrm:0
/proc/sys/net/ipv4/conf/all/arp_accept:0
/proc/sys/net/ipv4/conf/all/arp_ignore:0
/proc/sys/net/ipv4/conf/all/arp_announce:0
/proc/sys/net/ipv4/conf/all/arp_filter:0
/proc/sys/net/ipv4/conf/all/tag:0
/proc/sys/net/ipv4/conf/all/log_martians:0
/proc/sys/net/ipv4/conf/all/bootp_relay:0
/proc/sys/net/ipv4/conf/all/medium_id:0
/proc/sys/net/ipv4/conf/all/proxy_arp:0
/proc/sys/net/ipv4/conf/all/accept_source_route:0
/proc/sys/net/ipv4/conf/all/send_redirects:1
/proc/sys/net/ipv4/conf/all/rp_filter:0
/proc/sys/net/ipv4/conf/all/shared_media:1
/proc/sys/net/ipv4/conf/all/secure_redirects:1
/proc/sys/net/ipv4/conf/all/accept_redirects:0
/proc/sys/net/ipv4/conf/all/mc_forwarding:0
/proc/sys/net/ipv4/conf/all/forwarding:1
/proc/sys/net/ipv4/conf/default/promote_secondaries:0
/proc/sys/net/ipv4/conf/default/force_igmp_version:0
/proc/sys/net/ipv4/conf/default/disable_policy:0
/proc/sys/net/ipv4/conf/default/disable_xfrm:0
/proc/sys/net/ipv4/conf/default/arp_accept:0
/proc/sys/net/ipv4/conf/default/arp_ignore:0
/proc/sys/net/ipv4/conf/default/arp_announce:0
/proc/sys/net/ipv4/conf/default/arp_filter:0
/proc/sys/net/ipv4/conf/default/tag:0
/proc/sys/net/ipv4/conf/default/log_martians:0
/proc/sys/net/ipv4/conf/default/bootp_relay:0
/proc/sys/net/ipv4/conf/default/medium_id:0
/proc/sys/net/ipv4/conf/default/proxy_arp:0
/proc/sys/net/ipv4/conf/default/accept_source_route:1
/proc/sys/net/ipv4/conf/default/send_redirects:1
/proc/sys/net/ipv4/conf/default/rp_filter:0
/proc/sys/net/ipv4/conf/default/shared_media:1
/proc/sys/net/ipv4/conf/default/secure_redirects:1
/proc/sys/net/ipv4/conf/default/accept_redirects:1
/proc/sys/net/ipv4/conf/default/mc_forwarding:0
/proc/sys/net/ipv4/conf/default/forwarding:1
/proc/sys/net/ipv4/conf/eth0/promote_secondaries:0
/proc/sys/net/ipv4/conf/eth0/force_igmp_version:0
/proc/sys/net/ipv4/conf/eth0/disable_policy:0
/proc/sys/net/ipv4/conf/eth0/disable_xfrm:0
/proc/sys/net/ipv4/conf/eth0/arp_accept:0
/proc/sys/net/ipv4/conf/eth0/arp_ignore:0
/proc/sys/net/ipv4/conf/eth0/arp_announce:0
/proc/sys/net/ipv4/conf/eth0/arp_filter:0
/proc/sys/net/ipv4/conf/eth0/tag:0
/proc/sys/net/ipv4/conf/eth0/log_martians:0
/proc/sys/net/ipv4/conf/eth0/bootp_relay:0
/proc/sys/net/ipv4/conf/eth0/medium_id:0
/proc/sys/net/ipv4/conf/eth0/proxy_arp:0
/proc/sys/net/ipv4/conf/eth0/accept_source_route:1
/proc/sys/net/ipv4/conf/eth0/send_redirects:1
/proc/sys/net/ipv4/conf/eth0/rp_filter:0
/proc/sys/net/ipv4/conf/eth0/shared_media:1
/proc/sys/net/ipv4/conf/eth0/secure_redirects:1
/proc/sys/net/ipv4/conf/eth0/accept_redirects:1
/proc/sys/net/ipv4/conf/eth0/mc_forwarding:0
/proc/sys/net/ipv4/conf/eth0/forwarding:1
/proc/sys/net/ipv4/conf/lo/promote_secondaries:0
/proc/sys/net/ipv4/conf/lo/force_igmp_version:0
/proc/sys/net/ipv4/conf/lo/disable_policy:1
/proc/sys/net/ipv4/conf/lo/disable_xfrm:1
/proc/sys/net/ipv4/conf/lo/arp_accept:0
/proc/sys/net/ipv4/conf/lo/arp_ignore:0
/proc/sys/net/ipv4/conf/lo/arp_announce:0
/proc/sys/net/ipv4/conf/lo/arp_filter:0
/proc/sys/net/ipv4/conf/lo/tag:0
/proc/sys/net/ipv4/conf/lo/log_martians:0
/proc/sys/net/ipv4/conf/lo/bootp_relay:0
/proc/sys/net/ipv4/conf/lo/medium_id:0
/proc/sys/net/ipv4/conf/lo/proxy_arp:0
/proc/sys/net/ipv4/conf/lo/accept_source_route:1
/proc/sys/net/ipv4/conf/lo/send_redirects:1
/proc/sys/net/ipv4/conf/lo/rp_filter:0
/proc/sys/net/ipv4/conf/lo/shared_media:1
/proc/sys/net/ipv4/conf/lo/secure_redirects:1
/proc/sys/net/ipv4/conf/lo/accept_redirects:1
/proc/sys/net/ipv4/conf/lo/mc_forwarding:0
/proc/sys/net/ipv4/conf/lo/forwarding:1

>>> c) You have no port 0 service set up?
>> Definitely not
>
> I see. Not! :)
>
>>>> Mar 29 11:01:48 dev1 kernel: IPVS: ip_vs_wlc_schedule(): Scheduling...
>>>> Mar 29 11:01:48 dev1 kernel: IPVS: WLC: server 120.1.1.1:25 
>>>> activeconns 0 refcnt 1 weight 1 overhead 0
>>>> Mar 29 11:01:48 dev1 kernel: IPVS: Bind-dest TCP c:140.1.1.1:4042 
>>>> v:100.1.1.2:25 d:120.1.1.1:25 fwd:T s:0 conn->flags:182 
>>>> conn->refcnt:1 dest->refcnt:2
>>>> Mar 29 11:01:48 dev1 kernel: IPVS: Schedule fwd:T c:140.1.1.1:4042 
>>>> v:100.1.1.2:25 d:120.1.1.1:25 conn->flags:1C2 conn->refcnt:2
>>>
>>> This looks like it would happily send it.
>>>
>>>> Mar 29 11:01:48 dev1 kernel: IPVS: TCP input  [S...] 
>>>> 120.1.1.1:25->140.1.1.1:4042 state: NONE->SYN_RECV conn->refcnt:2
>>>
>>> Ok, we do the state transition indicating that we've allocated the 
>>> connection structure for the hash table entry.
>>>
>>>> Mar 29 11:01:51 dev1 kernel: IPVS: lookup/in TCP 
>>>> 140.1.1.1:4042->100.1.1.2:25 hit
>>>
>>> Second SYN as seen in your non-functional tcpdump trace.
>>>
>>>> Mar 29 11:01:57 dev1 kernel: IPVS: lookup/in TCP 
>>>> 140.1.1.1:4042->100.1.1.2:25 hit
>>>
>>> Third SYN as seen in your non-functional tcpdump trace.
>>>
>>>> Mar 29 11:02:04 dev1 kernel: IPVS: Unbind-dest TCP c:140.1.1.1:4039 
>>>> v:100.1.1.2:25 d:120.1.1.2:25 fwd:T s:3 conn->flags:182 
>>>> conn->refcnt:1 dest->refcnt:2
>>>
>>> This is not belonging to the trace above since it's port 4039 which 
>>> must have been a test performed before you took the trace. Most 
>>> likely this one ran into the normal 60 sec timeout.
>>>
>>>> I really am at a loss as to why this doesn't work, the debug log 
>>>> seems to show IPVS passing traffic to mail 1 (120.1.1.1) however 
>>>> the tcpdump for that server shows absolutely nothing.  If anyone 
>>>> can point me in the right direction here I would be very grateful.
>>>
>>> Can you show your routing information on your LVS? As well as the 
>>> tun* device configuration in the proc-fs?
>>>
>> Sure, by LVS i'm going to assume you mean the broken load balancer.
>>
>> # route -n
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    
>> Use Iface
>> 100.1.1.0     0.0.0.0         255.255.255.0   U     0      0        0 
>> eth0
>> 0.0.0.0         100.1.1.254     0.0.0.0         UG    0      0        
>> 0 eth0
>
> Could you please send me the iproute2 related output, as indicated 
> above? route -n does not show all the routing entries on a Linux box.
>
By this point you should have already skimmed the output ;)
>> # find /proc |grep tun
>
> Sidenote: You might not call that command like that too often on your 
> productive server. I've seen nasty kernel OOPS more than once after 
> such a stat()-intensive command.
>
Exxxxxcellent.
>> This is odd, tunl0 does exist:
>>
>> # ifconfig tunl0
>> tunl0     Link encap:IPIP Tunnel  HWaddr
>>          NOARP  MTU:1480  Metric:1
>>          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:0
>>          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> Sure, but it's not activated. Could you by any chance call following 
> command on your box?
>
> ip link set dev tunl0 up
>
Mhmm this has been done, however I notice that on the working load 
balancer, the tunl0 device is not visible in ifconfig output (i.e. is 
not activated).  Excuse me while I stay with my vintage ip-command 
friends for a little while longer :)
>> Don't know why its absent from /proc.
>
> Since there are no IFF_RUNNING|IFF_UP flags set, there's no point in 
> setting any entries for this virtual device in the proc-fs.
>
>> Thanks again for your assistance,
>
> Always when receiving such nice bug reports,
> Roberto Nibali, ratz

Kind Regards,

-- 
Mark Wadham
e: mark.wadham at areti.net t: +44 (0)20 8315 5800 f: +44 (0)20 8315 5801
Areti Internet Ltd., http://www.areti.net/ 

===================================================================
Areti Internet Ltd: BS EN ISO 9001:2000
Providing corporate Internet solutions for more than 10 years.
===================================================================


Search lvs-users Archives
Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort

More information about the lvs-users mailing list