[lvs-users] ksoftirqd/0 runs at a 100% and looks like it could a kernel issue.
graeme at graemef.net
Thu May 31 11:43:43 BST 2012
On Thu, 2012-05-31 at 12:21 +0200, Brent Clark wrote:
> Im writing to you because we have a two node cluster setup with just apache2, heartbeart and LVS, and every now and then find that ksoftirqd/0 runs at a 100%. We find that the only way to fix it, is to actually reboot the server.
> Let me know what you think, if someone could assist, it would be appreciated.
I would hazard a guess that at the time you have this issue you have
packets being "reflected" between the IPVS frameworks on both hosts.
Depending on configuration, it is possible in a two-node cluster to have
the following happen:
1. Packet arrives on server1 from client.
2. Server1 IPVS scheduler looks in table, sends packet on to server2
3. Packet arrives on server2
4. Server2 IPVS scheduler looks in table, sends packet on to server1
5. Packet arrives on server1
6. goto 2.
Rinse, repeat, ad infinitum.
It would be advisable to permit the system to get into the 100% softirq
situation, then run tcpdump to look at the traffic between to the two
servers. I would pin money on you seeing the same packet (same TCP
sequence number, for example) over and over and over again.
The only ways to prevent this are to either not have active/active IPVS
tables (so only the active director with the VIP has an active ipvsadm
table) *or* to use fwmarks rather than IP based load balancing. That way
you can exclude packets arriving from a MAC address of the other server
from being processed by IPVS.
More information about the lvs-users