Page cannot be displayed
Roberto Nibali
ratz at drugphish.ch
Thu Mar 29 14:52:50 BST 2007
> We recently set up an Ultra Monkey load balancer with 2 real servers
> and 95% of the time it seems to be working perfectly, but every now
> and then our customers are getting "Page cannot be displayed" errors.
What's the average/peak request rate and size?
> It happens at different stages on our websites and we can't seem to
> reproduce the problem here. Out customers are very large Fortune 500
> companies so we assume that their networking etc is top of the line,
> and the fact that it is occurring with multiple customers we assume it
> is our architecture. Our LB environment is as follows :
>
> Ultra monkey box :
> CentOS 4.4
> ldirectord.cf :
>
> # Global Directives
> checktimeout=5
> checkinterval=5
> #fallback=127.0.0.1:80
> autoreload=yes
> #logfile="/var/log/ldirectord.log"
> logfile="local0"
Can you correlate any log messages from ldirectord with the 5% page
display problems? Since you seem to have a very high timeout value for
your persistency and no indication of expire_nodest_conn it's not easy
to pinpoint the problem. What kind of application is running behind the
services? Does the application logic span over both services within the
lifetime context? Does the fallback work?
> # Controls IP packet forwarding
> net.ipv4.ip_forward = 1
> # Controls source route verification
> net.ipv4.conf.default.rp_filter = 1
> # Do not accept source routing
> net.ipv4.conf.default.accept_source_route = 0
> # Controls the System Request debugging functionality of the kernel
> kernel.sysrq = 0
Besides the strange comment, enabling this can be helpful at times.
> 1: I assume that because we are using masq(NAT) that we don't need to
> worry about the noarp problem with DR or TUN?
Correct.
> 2: Is there any ip tuning that we should do on the Ultra Monkey box as
> not only is it acting as the load balancer but it is also a router
> too?
Only if you experience performance problems. So I'd like to ask back if
you've previously seen any indication of such problems in your log files
(including kernel log: dmesg -s 100000).
> 3: Has anybody else seen this intermittent "Page cannot be displayed"
> error with UM?
Sure, but there's tons of possibilities for this to happen. I can
envision that ldirectord takes one of the RS out and due to the high
service template timeout and the missing expire_nodest_conn setting and
probably other issues, client requests are still being forwarded to the
non-functional RS, which will definitely cause such a message to be
displayed on the client's browser.
For your own amusement, I've allowed myself to quote the KB241344
article from Microsoft:
http://support.microsoft.com/kb/241344/EN-US/
This is maybe a wonderful example of why Microsoft is so much more
successful than others: No mentioning of tcpdump/windump to their users
and of course it's always the fault of the user :).
Regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
Search lvs-users Archives
More information about the lvs-users
mailing list