[ale] bond0 went down

Lightner, Jeff jlightner at water.com
Thu Sep 16 11:36:35 EDT 2010


cat /proc/net/bonding/bond0 ouput:

Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)

 

Bonding Mode: load balancing (round-robin)

MII Status: up

MII Polling Interval (ms): 0

Up Delay (ms): 0

Down Delay (ms): 0

 

Slave Interface: eth2

MII Status: up

Link Failure Count: 0

Permanent HW addr: 00:04:23:ba:f1:20

 

Slave Interface: eth3

MII Status: up

Link Failure Count: 0

Permanent HW addr: 00:04:23:ba:f1:21

 

The switch log was not helpful - it simply shows the links going up and
down and doesn't even tell us WHEN it saw that because its time field
had something like 2+ years in it.   The network admin reset the time so
if it occurs again we'll have better time.   There is no other detail
than the links going down and up.  

 

I don't think its an issue with the bonding setup or the switch's
recognition of that because we have another RAC environment like this
one that does the same bonding setup and uses the same switch since it
was first put in place over 2 years ago.   In fact the one that went
down is a Test environment built modeled on the other once which is our
Production environment.  This test environment has been running since
around April of this year.   If flapping were an issue I'd expect to
have seen it long before now.

 

________________________________

From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of Joey
Rutledge
Sent: Thursday, September 16, 2010 11:05 AM
To: Atlanta Linux Enthusiasts - Yes! We run Linux!
Subject: Re: [ale] bond0 went down

 

A few questions I have:

 

What type of bond method are you using?  round robin, active passive,
etc    cat /proc/net/bonding/bond0

 

What is the uplink switch and do you have logs on it that you can check
for when the interfaces went down?

 

I've seen in our environment that round-robin simply doesn't work with
the switch configuration and causes interfaces to flap.  We use
active-passive bonding for all of our servers.

 

Joey

 

On Sep 15, 2010, at 5:11 PM, Lightner, Jeff wrote:





Can anyone tell me what the below messages mean?   I didn't find many
hits on the web:

 

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface bond0.IPv6 no
longer relevant for mDNS.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast
group on interface bond0.IPv6 with address fe80::204:23ff:feba:f120.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface bond0.IPv4 no
longer relevant for mDNS.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast
group on interface bond0.IPv4 with address 192.168.8.73.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record
for fe80::204:23ff:feba:f120 on bond0.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record
for 192.168.8.73 on bond0.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: New relevant interface
bond0.IPv4 for mDNS.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Joining mDNS multicast
group on interface bond0.IPv4 with address 192.168.8.73.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Registering new address
record for 192.168.8.73 on bond0.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface eth2.IPv6 no
longer relevant for mDNS.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast
group on interface eth2.IPv6 with address fe80::204:23ff:feba:f120.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record
for fe80::204:23ff:feba:f120 on eth2.

Sep 14 13:15:45 atlrdtd1 kernel: bonding: bond0: Interface eth2 is
already enslaved!

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface eth3.IPv6 no
longer relevant for mDNS.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast
group on interface eth3.IPv6 with address fe80::204:23ff:feba:f120.

Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record
for fe80::204:23ff:feba:f120 on eth3.

Sep 14 13:15:45 atlrdtd1 kernel: bonding: bond0: Interface eth3 is
already enslaved!

Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: New relevant interface
bond0.IPv6 for mDNS.

Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: Joining mDNS multicast
group on interface bond0.IPv6 with address fe80::204:23ff:feba:f120.

Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: Registering new address
record for fe80::204:23ff:feba:f120 on bond0.

 

Background:  

We have an Oracle RAC cluster of 2 nodes.   Yesterday one of the nodes
rebooted and its log indicates that Oracle forced the reboot to preserve
cluster integrity.   There were no other messages in that node's
/var/log/messages near the time of this message and reboot.   

 

We use a private lan setup on 2 bonded NICs on each side for the Oracle
Cluster Ready Services to communicate with each other.    That is bond0
and is using 2 Intel GigE NIC ports on both sides (eth2 and eth3 are the
NICs).    We found that the connectivity on the private lan had gone
away and on checking found that both eth2 and eth3 on the node that got
these messages was showing no link.   Running "ifdown bond0" followed by
"ifup bond0" re-established links on both eth2 and eth3.

 

The above messages occurred on the node where bond0's links were down
less than 2 minutes before the node that rebooted issued the message
about shutting down to preserve cluster integrity.   It seems fairly
clear the cause of the reboot was the loss of connectivity but I can't
really determine from the above log entries WHY bond0 went down.  So was
hoping someone had seen something like this and could give me a clue.  

 

P.S.  We don't actually use the ipv6 - the relevant addresses are the
ipv4 ones.   Apparently the guy who set this up didn't disable ipv6 on
these NICs but I don't believe that is the issue as they have been up
for a few months with this configuration.

 

Proud partner. Susan G. Komen for the Cure.

 

Please consider our environment before printing this e-mail or
attachments.

----------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you
have received the message in error, and delete it. Thank you.
----------------------------------

_______________________________________________
Ale mailing list
Ale at ale.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20100916/afca5562/attachment-0001.html 


More information about the Ale mailing list