Lost packets and dead/warntime

Sebastian Vieira sebvieira at gmail.com
Fri Sep 1 09:27:28 BST 2006


On 8/18/06, Graeme Fowler <graeme at graemef.net> wrote:
>
> Beyond ensuring that the machines' network settings are good, that
> they're not accumulating errors at the hardware level (check ifconfig
> output), and that they're not interrupting themselves off the planet
> (/proc/interrupts is a good place to start), I have no idea.


Hi. Sorry for the late reply. Work work work and no play.

I've checked ifconfig output and see this:

eth2      Link encap:Ethernet  HWaddr 00:02:A5:08:E3:73
          inet addr:172.16.0.102  Bcast:172.16.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:76727063 errors:3044 dropped:0 overruns:0 frame:3044
          TX packets:76774485 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3882941796 (3703.0 Mb)  TX bytes:3900571779 (3719.8 Mb)


eth2      Link encap:Ethernet  HWaddr 00:02:A5:09:79:CD
          inet addr:172.16.0.101  Bcast:172.16.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1432209 errors:156 dropped:0 overruns:0 frame:156
          TX packets:1432784 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:232381530 (221.6 Mb)  TX bytes:230872325 (220.1 Mb)


Now i don't know for sure where the errors come from, or what 'frame' means,
but i'm sure it's not very good. I've looked into /proc/interrupts and i see
that on one box all nics are sharing int15, on the other int11. But there's
a huge number in front of the interrupt that keeps changing (increasing). I
suppose that's not very good either:

11:  369029512          XT-PIC  eth2, eth0, eth1

15:    3131945          XT-PIC  eth2, eth0, eth1


It still sounds to me like the fault lies below the application layer.
>
> Speaking of interrupts; you say you have eth0/1 bonded. Please make sure
> that you haven't got several hundred megs worth of traffic looping


I would love to, but i don't know how.

around your ethernet because of that. If you have you could be dropping
> packets simply because your kernels cannot keep up with the traffic load
> - a layer 2 loop somewhere could cause an effective DoS condition like
> this quite trivially.
>
> What mode is your bond interface in?


active-slave

I've never used heartbeat, so I can't really suggest anything else.
> Anyone else got any clever ideas?
>
> Graeme


Thanks,

Sebastian

Search lvs-users Archives
Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort

More information about the lvs-users mailing list