heartbeat node taking over resources upon reboot
Roberto Nibali
ratz at drugphish.ch
Fri Nov 10 18:55:00 GMT 2006
Hello,
> Every time i reboot the active node, it comes back as the backup as normal,
> but then it suddenly declares itself dead and says it has no local
> heartbeat
> (???) and restarts. While it's restarting it happily declares the other
> node
> dead as well and (i guess) starts taking over the resources. Resulting in
> every connected client to disconnect.
Sounds like timing issues. This is also a typical question for the
linux-ha mailinglist where people can give you appropriate answers in
shorter time than here normally.
> I also see that it says somewhere "Deadtime value may be too small", but in
> normal production i don't see any 'late heartbeats' or such, which made me
> not change them. My ha.cf :
>
> udpport 694
> logfacility local0
> keepalive 75ms
> deadtime 300ms
> warntime 200ms
Your timings are absolutely crazy. This will only work in the lab. Also,
there's no point in having such a snappy system, especially if you
configure template synchronisation, when deploying LVS.
http://www.linux-ha.org/ha.cf/DeadtimeDirective
http://www.linux-ha.org/FAQ#heavy_load
> initdead 60
> mcast eth1 224.1.2.3 694 1 0
> auto_failback off
> node rpzlvs05 rpzlvs06
>
> My question is, should i really go experiment with the *time values again,
> or is it something else?
In my opinion you should instrument those values to a more sane value.
Also note that even though the kernel operates between 100Hz and 1000Hz,
there is no guarantee user-space gets assigned 10ms-1ms slices,
especially during boot up, where we have a fork-bomb situation with all
the deamons starting and writing their shit on the platter. Unless you
run a hard RT-enabled kernel, you get blocking I/O peeks in the high 100ms.
I would be surprised if setting a higher deadtime does not fix your
issues, then again the experts are next door on the linux-ha mailinglist.
HTH and best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
Search lvs-users Archives
More information about the lvs-users
mailing list