Problems with IPVS
ratz at drugphish.ch
Tue Oct 17 16:35:42 BST 2006
> Dumps attached on previous e-mail were done on bond0 interface which is
> facing proxy. tcpdumps done on proxy confirms the problem.
Hehe, you definitely want to use all possible features of Linux
networking. How is your bonding configured, ALB? There is an outstanding
issue with regard to packet reassembly on bond devices using ALB. It's
highly unlikely that you're experiencing it, though. But this could
explain your not perfect looking ethereal :).
> tcpdump.cap - DNAT case
> tcpdump2.cap - LVS case
> tcpdump3.cap - LVS case and Nokia phone
Still no data at my end.
>>> 1. phone sends SYN packet to proxy;
>> Means (from previous email context):
>> Phone --> GRE tunnel --> netwap --> fwmark --> LVS --> proxy
> Yes. netwap is interface on the same server running LVS.
>> How many devices are we talking about including Phone and proxy?
> Phone, SGSN/GGSN, PIX firewall (one end of GRE is there), server, proxy.
Excellent, thanks. Does the PIX belong to the carrier? I presume, the IP
addresses after the PIX are still non-publicly routeable IP addresses?
@Joe: In case you want to update the LVS-Howto:
>>> 2. proxy responds with SYN,ACK;
>>> 3. phone sends ACK;
>> Beautiful, if this goes through LVS, it's already a big step towards a
>> correctly working LVS.
> Nokia phones works through LVS without problems.
Hmm, since you talk about re-transmission, I wonder one of the following
contexts apply (http://tools.ietf.org/html/rfc3344#page-83):
C.1. TCP Timers
When high-delay (e.g. SATCOM) or low-bandwidth (e.g. High-Frequency
Radio) links are in use, some TCP stacks may have insufficiently
adaptive (non-standard) retransmission timeouts. There may be
spurious retransmission timeouts, even when the link and network
are actually operating properly, but just with a high delay because
of the medium in use. This can cause an inability to create or
maintain TCP connections over such links, and can also cause unneeded
retransmissions which consume already scarce bandwidth. Vendors
are encouraged to follow the algorithms in RFC 2988  when
implementing TCP retransmission timers. Vendors of systems designed
for low-bandwidth, high-delay links should consult RFCs 2757 and
2488 [28, 1]. Designers of applications targeted to operate on
mobile nodes should be sensitive to the possibility of timer-related
C.2. TCP Congestion Management
Mobile nodes often use media which are more likely to introduce
errors, effectively causing more packets to be dropped. This
introduces a conflict with the mechanisms for congestion management
found in modern versions of TCP . Now, when a packet is dropped,
the correspondent node's TCP implementation is likely to react as
if there were a source of network congestion, and initiate the
slow-start mechanisms  designed for controlling that problem.
However, those mechanisms are inappropriate for overcoming errors
introduced by the links themselves, and have the effect of magnifying
the discontinuity introduced by the dropped packet. This problem has
been analyzed by Caceres, et al. . TCP approaches to the problem
of handling errors that might interfere with congestion management
are discussed in documents from the [pilc] working group [3, 9].
While such approaches are beyond the scope of this document,
they illustrate that providing performance transparency to mobile
nodes involves understanding mechanisms outside the network layer.
Problems introduced by higher media error rates also indicate the
need to avoid designs which systematically drop packets; such designs
might otherwise be considered favorably when making engineering
But then we'd definitely have a problem with IPVS. However, let's not
jump to early conclusions.
>>> 4. phone sends HTTP GET request;
>>> 5. proxy ACKs packet 4;
>> Only ACK? No data?
Window size? adv size?
>>> 6. proxy sends HTTP data packet;
>>> 7. proxy sends another HTTP data packet;
>>> 8. proxy sends FIN packet;
>>> weird things starts here
>>> 9. phone once more sends ACK packet acknowledging packet 2
>>> (duplicate of packet 3);
>> Does the proxy have SACK/FACK support enabled?
> Proxy is CentOS4 Linux server running Squid.
And you see nothing unusual in your squid logs when connecting with SE
> # sysctl net.ipv4.tcp_fack net.ipv4.tcp_sack
> net.ipv4.tcp_fack = 1
> net.ipv4.tcp_sack = 1
Does disabling (just for a test) SACK change anything?
>>> 10. and one more dupe of packet 3;
>>> 11.-14. proxy repeats packet 6. 4 times.
>> It has to. Is ECN enabled?
> Once again sysctl says that no. Both on LVS server and on proxy.
What are the kernel versions? (Sorry, if this is a dupe.)
>>> The problem is that LVS does not pass packets 11. to 14. to phone. Why?
>> Because packet 8 was FIN and LVS is not stateful with regard to TCP
>> sessions and retransmits.
> But phone did not acknowledged that FIN yet?
Sure, but we act on first seen FIN regarding template expiration, IIRC:
But I'd need to check the code again. Take this with a grain of salt.
>>> In case of DNAT packets 11.-14. are passed to phone which at the end
>>> acknowledges packets 6. and 7. and then acknowledges packet 8. thus
>>> closing TCP connection.
>> Here I don't follow your statements, sorry.
> If I setup DNAT instead of LVS then packets 11.-14. are sent to phone.
> In case of LVS they are not.
So you get to see packets 11-14 on the outbound interface of LVS from
Squid, but never on the inbound interface (direction of PIX)? This is
> And after phone receives those packets it sends ACK to packets 6. and
> 7. and then to 8.
But only for DNAT.
Roberto Nibali, ratz
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
More information about the lvs-users