[lvs-users] [Keepalived-devel] Keepalived communication with kernel failing after some time

Rodrigo Severo rodrigo at fabricadeideias.com
Wed Nov 9 16:54:44 GMT 2011


Hi,


First of all let me thank you for your help and attention and for pointing
me to the LVS users mailing list. I wasn't aware of it.


On Wed, Nov 9, 2011 at 12:58 PM, Graeme Fowler <graeme at graemef.net> wrote:

> [copying in the LVS users list]
>
> On Wed, 2011-11-09 at 12:04 -0200, Rodrigo Severo wrote:
> > I have been using keepalived for some years now.
> >
> > For some time now keepalived has started to fail when updating VS on
> > the kernel. This kind of thing happens after some time where
> > keepalived is working perfectly, i.e., failed servers been succesfully
> > removed and returned servers successfully added to VSs. Just after
> > keepalived is started everything works fine. After some time it starts
> > to fail to update the VSs on the kernel.
> >
> > To make it work again I just have to restart keepalived.
> >
> > The error message I get on these failures are like:
> >
> > [Keepalived_healthcheckers] IPVS: Invalid operation.  Possibly wrong
> > module version, address not unicast, ...
> >
> >
> > It's important to observe that the same exact operation that works
> > fine just after keepalived is started will fail with the above error
> > after some time (one or two hours) so the suggestions on the error
> > message - wrong module version, wrong kind of address - can be safely
> > discarded as causes of the problem.
> >
> > I'm using Gentoo with kernel 3.0.6 and keepalived 1.2.2.
> >
> > Any suggestions on how I can further debug this issue?
>
> Yes. Please grab the log lines which indicate keepalived starting, doing
> stuff to servers, then failing to do stuff to servers and send it to
> lvs-users at linuxvirtualserver.org. I think we need to see timing, the
> number of operations done and so on.
>

Here is a example: http://pastebin.com/uwzKKGXh

Please observe that all VS updates up to 11:51 worked fine. Both updates
after 14:14 failed with the above error message.

You will also see that there aren't many updates happening.


> Your kernel is "out there" some way ahead of large numbers of the rest
> of the world who lag behind on the 2.6.x branch. I suspect something
> isn't quite right in the IPVS code in 3.0.x but I couldn't say what it
> is.
>

If you believe the kernel version might be to blame, I can try some older
one.

Do you have a suggestion of version to test? Versions 2.6.39, 2.6.38 and
2.6.32 are specially easy to test but I can test any other version you
believe is important.

I forget to mention in my first message what I believe is causing the
problem: some kind of timeout on the socket used by keepalived to
communicate with the kernel. I don't have any particular info pointing to
this except the fact that everything works for some time after keepalived
is started and after some time it stops. Unfortunately I don't know how
would I test this hypothesis.



-- 
---------------------------------------------------------------------------------------
Rodrigo Severo

Fábrica de Idéias
SBS Quadra 2 - Bloco S - Ed. Empire Center - Sala 1.301
Brasília - DF - CEP 70070-904
Tel. (61) 3321-1357       Fax (61) 3223-1712
---------------------------------------------------------------------------------------



More information about the lvs-users mailing list