[ale] bond0 went down

Thu Sep 16 11:57:24 EDT 2010

Doesn't MII Polling Interval have to have a non-0 value? A 0 value means no
polling so no notice of failure (I think)

On Thu, Sep 16, 2010 at 11:36 AM, Lightner, Jeff <jlightner at water.com>wrote:

>  cat /proc/net/bonding/bond0 ouput:
>
> Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)
>
>
>
> Bonding Mode: load balancing (round-robin)
>
> MII Status: up
>
> MII Polling Interval (ms): 0
>
> Up Delay (ms): 0
>
> Down Delay (ms): 0
>
>
>
> Slave Interface: eth2
>
> MII Status: up
>
> Link Failure Count: 0
>
> Permanent HW addr: 00:04:23:ba:f1:20
>
>
>
> Slave Interface: eth3
>
> MII Status: up
>
> Link Failure Count: 0
>
> Permanent HW addr: 00:04:23:ba:f1:21
>
>
>
> The switch log was not helpful – it simply shows the links going up and
> down and doesn’t even tell us WHEN it saw that because its time field had
> something like 2+ years in it.   The network admin reset the time so if it
> occurs again we’ll have better time.   There is no other detail than the
> links going down and up.
>
>
>
> I don’t think its an issue with the bonding setup or the switch’s
> recognition of that because we have another RAC environment like this one
> that does the same bonding setup and uses the same switch since it was first
> put in place over 2 years ago.   In fact the one that went down is a Test
> environment built modeled on the other once which is our Production
> environment.  This test environment has been running since around April of
> this year.   If flapping were an issue I’d expect to have seen it long
> before now.
>
>
>  ------------------------------
>
> *From:* ale-bounces at ale.org [mailto:ale-bounces at ale.org] *On Behalf Of *Joey
> Rutledge
> *Sent:* Thursday, September 16, 2010 11:05 AM
> *To:* Atlanta Linux Enthusiasts - Yes! We run Linux!
> *Subject:* Re: [ale] bond0 went down
>
>
>
> A few questions I have:
>
>
>
> What type of bond method are you using?  round robin, active passive, etc
>  cat /proc/net/bonding/bond0
>
>
>
> What is the uplink switch and do you have logs on it that you can check for
> when the interfaces went down?
>
>
>
> I've seen in our environment that round-robin simply doesn't work with the
> switch configuration and causes interfaces to flap.  We use active-passive
> bonding for all of our servers.
>
>
>
> Joey
>
>
>
> On Sep 15, 2010, at 5:11 PM, Lightner, Jeff wrote:
>
>
>
>    Can anyone tell me what the below messages mean?   I didn’t find many
> hits on the web:
>
>
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface bond0.IPv6 no longer
> relevant for mDNS.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group
> on interface bond0.IPv6 with address fe80::204:23ff:feba:f120.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface bond0.IPv4 no longer
> relevant for mDNS.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group
> on interface bond0.IPv4 with address 192.168.8.73.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for
> fe80::204:23ff:feba:f120 on bond0.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for
> 192.168.8.73 on bond0.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: New relevant interface
> bond0.IPv4 for mDNS.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Joining mDNS multicast group
> on interface bond0.IPv4 with address 192.168.8.73.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Registering new address record
> for 192.168.8.73 on bond0.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface eth2.IPv6 no longer
> relevant for mDNS.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group
> on interface eth2.IPv6 with address fe80::204:23ff:feba:f120.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for
> fe80::204:23ff:feba:f120 on eth2.
>
> Sep 14 13:15:45 atlrdtd1 kernel: bonding: bond0: Interface eth2 is already
> enslaved!
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Interface eth3.IPv6 no longer
> relevant for mDNS.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Leaving mDNS multicast group
> on interface eth3.IPv6 with address fe80::204:23ff:feba:f120.
>
> Sep 14 13:15:45 atlrdtd1 avahi-daemon[6709]: Withdrawing address record for
> fe80::204:23ff:feba:f120 on eth3.
>
> Sep 14 13:15:45 atlrdtd1 kernel: bonding: bond0: Interface eth3 is already
> enslaved!
>
> Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: New relevant interface
> bond0.IPv6 for mDNS.
>
> Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: Joining mDNS multicast group
> on interface bond0.IPv6 with address fe80::204:23ff:feba:f120.
>
> Sep 14 13:15:47 atlrdtd1 avahi-daemon[6709]: Registering new address record
> for fe80::204:23ff:feba:f120 on bond0.
>
>
>
> Background:
>
> We have an Oracle RAC cluster of 2 nodes.   Yesterday one of the nodes
> rebooted and its log indicates that Oracle forced the reboot to preserve
> cluster integrity.   There were no other messages in that node’s
> /var/log/messages near the time of this message and reboot.
>
>
>
> We use a private lan setup on 2 bonded NICs on each side for the Oracle
> Cluster Ready Services to communicate with each other.    That is bond0 and
> is using 2 Intel GigE NIC ports on both sides (eth2 and eth3 are the
> NICs).    We found that the connectivity on the private lan had gone away
> and on checking found that both eth2 and eth3 on the node that got these
> messages was showing no link.   Running “ifdown bond0” followed by “ifup
> bond0” re-established links on both eth2 and eth3.
>
>
>
> The above messages occurred on the node where bond0’s links were down less
> than 2 minutes before the node that rebooted issued the message about
> shutting down to preserve cluster integrity.   It seems fairly clear the
> cause of the reboot was the loss of connectivity but I can’t really
> determine from the above log entries WHY bond0 went down.  So was hoping
> someone had seen something like this and could give me a clue.
>
>
>
> P.S.  We don’t actually use the ipv6 – the relevant addresses are the ipv4
> ones.   Apparently the guy who set this up didn’t disable ipv6 on these NICs
> but I don’t believe that is the issue as they have been up for a few months
> with this configuration.
>
>
>
> Proud partner. Susan G. Komen for the Cure.
>
>
>
> *Please consider our environment before printing this e-mail or
> attachments.*
>
> ----------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential
> information and is for the sole use of the intended recipient(s). If you are
> not the intended recipient, any disclosure, copying, distribution, or use of
> the contents of this information is prohibited and may be unlawful. If you
> have received this electronic transmission in error, please reply
> immediately to the sender that you have received the message in error, and
> delete it. Thank you.
> ----------------------------------
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>

-- 
-- 
James P. Kinney III
I would rather stumble along in freedom than walk effortlessly in chains.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20100916/8615045d/attachment-0001.html