[ale] bond0 went down

Joey Rutledge joey at joeyrutledge.com
Thu Sep 16 11:04:30 EDT 2010


A few questions I have:

What type of bond method are you using?  round robin, active passive, etc    cat /proc/net/bonding/bond0

What is the uplink switch and do you have logs on it that you can check for when the interfaces went down?

I've seen in our environment that round-robin simply doesn't work with the switch configuration and causes interfaces to flap.  We use active-passive bonding for all of our servers.

Joey

On Sep 15, 2010, at 5:11 PM, Lightner, Jeff wrote:

> Can anyone tell me what the below messages mean?   I didn’t find many hits on the web:
>  
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface bond0.IPv6 no longer relevant for mDNS.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group on interface bond0.IPv6 with address fe80::204:23ff:feba:f120.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface bond0.IPv4 no longer relevant for mDNS.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group on interface bond0.IPv4 with address 192.168.8.73.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for fe80::204:23ff:feba:f120 on bond0.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for 192.168.8.73 on bond0.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: New relevant interface bond0.IPv4 for mDNS.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Joining mDNS multicast group on interface bond0.IPv4 with address 192.168.8.73.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Registering new address record for 192.168.8.73 on bond0.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface eth2.IPv6 no longer relevant for mDNS.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group on interface eth2.IPv6 with address fe80::204:23ff:feba:f120.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for fe80::204:23ff:feba:f120 on eth2.
> Sep 14 13:15:45 atlrdtd1 kernel: bonding: bond0: Interface eth2 is already enslaved!
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface eth3.IPv6 no longer relevant for mDNS.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group on interface eth3.IPv6 with address fe80::204:23ff:feba:f120.
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for fe80::204:23ff:feba:f120 on eth3.
> Sep 14 13:15:45 atlrdtd1 kernel: bonding: bond0: Interface eth3 is already enslaved!
> Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: New relevant interface bond0.IPv6 for mDNS.
> Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: Joining mDNS multicast group on interface bond0.IPv6 with address fe80::204:23ff:feba:f120.
> Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: Registering new address record for fe80::204:23ff:feba:f120 on bond0.
>  
> Background:  
> We have an Oracle RAC cluster of 2 nodes.   Yesterday one of the nodes rebooted and its log indicates that Oracle forced the reboot to preserve cluster integrity.   There were no other messages in that node’s /var/log/messages near the time of this message and reboot.   
>  
> We use a private lan setup on 2 bonded NICs on each side for the Oracle Cluster Ready Services to communicate with each other.    That is bond0 and is using 2 Intel GigE NIC ports on both sides (eth2 and eth3 are the NICs).    We found that the connectivity on the private lan had gone away and on checking found that both eth2 and eth3 on the node that got these messages was showing no link.   Running “ifdown bond0” followed by “ifup bond0” re-established links on both eth2 and eth3.
>  
> The above messages occurred on the node where bond0’s links were down less than 2 minutes before the node that rebooted issued the message about shutting down to preserve cluster integrity.   It seems fairly clear the cause of the reboot was the loss of connectivity but I can’t really determine from the above log entries WHY bond0 went down.  So was hoping someone had seen something like this and could give me a clue.  
>  
> P.S.  We don’t actually use the ipv6 – the relevant addresses are the ipv4 ones.   Apparently the guy who set this up didn’t disable ipv6 on these NICs but I don’t believe that is the issue as they have been up for a few months with this configuration.
>  
> Proud partner. Susan G. Komen for the Cure.
>  
> Please consider our environment before printing this e-mail or attachments.
> ----------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.
> ----------------------------------
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20100916/4d98a118/attachment-0001.html 


More information about the Ale mailing list