heartbeat node taking over resources upon reboot

Roberto Nibali ratz at drugphish.ch
Fri Nov 10 18:55:00 GMT 2006


Hello,

> Every time i reboot the active node, it comes back as the backup as normal,
> but then it suddenly declares itself dead and says it has no local 
> heartbeat
> (???) and restarts. While it's restarting it happily declares the other 
> node
> dead as well and (i guess) starts taking over the resources. Resulting in
> every connected client to disconnect.

Sounds like timing issues. This is also a typical question for the 
linux-ha mailinglist where people can give you appropriate answers in 
shorter time than here normally.

> I also see that it says somewhere "Deadtime value may be too small", but in
> normal production i don't see any 'late heartbeats' or such, which made me
> not change them. My ha.cf :
> 
> udpport 694
> logfacility local0
> keepalive 75ms
> deadtime 300ms
> warntime 200ms

Your timings are absolutely crazy. This will only work in the lab. Also, 
there's no point in having such a snappy system, especially if you 
configure template synchronisation, when deploying LVS.

http://www.linux-ha.org/ha.cf/DeadtimeDirective
http://www.linux-ha.org/FAQ#heavy_load

> initdead 60
> mcast eth1 224.1.2.3 694 1 0
> auto_failback off
> node rpzlvs05 rpzlvs06
> 
> My question is, should i really go experiment with the *time values again,
> or is it something else?

In my opinion you should instrument those values to a more sane value. 
Also note that even though the kernel operates between 100Hz and 1000Hz, 
there is no guarantee user-space gets assigned 10ms-1ms slices, 
especially during boot up, where we have a fork-bomb situation with all 
the deamons starting and writing their shit on the platter. Unless you 
run a hard RT-enabled kernel, you get blocking I/O peeks in the high 100ms.

I would be surprised if setting a higher deadtime does not fix your 
issues, then again the experts are next door on the linux-ha mailinglist.

HTH and best regards,
Roberto Nibali, ratz
-- 
echo 
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

Search lvs-users Archives
Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort

More information about the lvs-users mailing list