[ale] ext3-fs error (RH 3.4.6-2)

Wed Nov 30 15:16:29 EST 2011

my heart skips many beats recalling the number of systems that have failed
during a simple physical
 relocation. Even let them cool down for several hours before moving them
30 feet on a cart with soft tires and padding and drives gently removed and
shock and static protected separately. failed reoots, mobo's die, hard
drives fail to ever spin back up, power supplies drop a rail, ram never
works again, etc.

I've tried for years to get "the people who make decisions" to do the
simple following process:

when time for an upgrade to a server OS (major version change like
RHEL4->RHEL5)
1. Buy new server and do fresh new OS install
2. migrate old data to new system and begin testing.
3. Once testing is complete and new system taking the load, wipe the old
drives and sell the old system.
4. stop storing junk

On Wed, Nov 30, 2011 at 2:46 PM, Rich Faulkner <rfaulkner at 34thprs.org>wrote:

> **
> Disk controller in this case is an Adaptec 3805 running RAID 5EE.
>
> My thoughts were same lines:  old OS, time for upgrade and possible h/w
> failure impending or in progress...
>
> Thanks for the input all!   RinL
>
>
>
> On Wed, 2011-11-30 at 13:44 -0500, Michael B. Trausch wrote:
>
> On 11/30/2011 01:25 PM, Lightner, Jeff wrote:
> > A couple of things:
> >
> > 1)  You're not using RH 3.4.6-2 - the message tell you your kernel
> > was copiled by that version of gcc.   To see the version of RH you're
> > running do "cat /etc/issue" and/or "cat /etc/redhat-release".
>
> Indeed.  2.6.9 was used for RHEL4 from the looks of it, so it's likely
> that he's using that (which is ending support soon anyway).
>
> > 2)  The way RedHat does things is it releases a base package from
> > upstream then appends it own versioning to that so 2.6.9-42.ELsmp is
> > NOT the same as 2.6.9 on any other system as it may have backported
> > bug and security fixes in it.   (That being said kernel is handled
> > differently than many other packages so you can actually get kernel
> > updates from the RedHat yum repositories that might be newer than
> > 2.6.9x.
>
> This is generally true regardless of the distribution; most
> distributions patch the kernel in some way.  One reason that I prefer
> using upstream, vanilla kernels is that it's easier to get support for
> them than for distro-kernels (at least, IME, YMMV).
>
> > You should NOT attempt to download and compile a newer
> > kernel manually as it would no longer be RHEL supported at that
> > point.
>
> Only while the locally-compiled kernel is actually running.  If you have
> a problem with the kernel, the first thing to do is to determine if it
> is present in the vanilla kernel; if so, file the bug there and file a
> bug with the distribution to reference the upstream bug.  Otherwise, if
> you cannot reproduce, you have viable information that you can give to
> the distributor to say "this problem exists in your kernel version x.y.z
> pl eleventyone-foo but not upstream release x.y.z" and that is at least
> something to go on.
>
> > If you're using RHEL and paying a subscription fee you can call them
> > for support.  If you're NOT paying for a subscription fee and using
> > them for support you might want to consider moving to CentOS which is
> > a binary compile of RHEL sources.  It doesn't require subscription
> > fees but also doesn't have a support number.   (Of course you
> > wouldn't want to worry about this until you've solved your base
> > issue.)
>
> This would be the one case where it's likely easier to get support for
> the distro kernel, though I'd still be inclined to troubleshoot as far
> as I can before I start asking for support from the distributor, in the
> interest of reducing the amount of back-and-forth communication I have
> to do.  What can I say... I'm lazy!
>
> > My thought is as Mike said that it is likely an issue with the disk
> > controller or disks themselves.
>
> Possibly, though even so, the kernel shouldn't be attempting to deref a
> NULL pointer unless the kernel image itself is somehow corrupted or
> modified.  The thing is that in that case, it'd be very likely that the
> kernel wouldn't work at all (and in what I'd call a safe/secure system,
> it shouldn't because it should be somehow meaningfully signed, but
> that's neither here nor there).
>
> If the kernel's not corrupt and there is indeed a problem with the disk
> controller or the disk itself, it shouldn't be able to cause the kernel
> to crash by deref'ing a NULL pointer; the kernel should be able to catch
> such an issue and freeze the FS to save it from any further problems.  A
> panic would be warranted, IMHO, but with hopefully a more meaningful
> message.
>
> 	--- Mike
>
> _______________________________________________
> Ale mailing list
> Ale at ale.orghttp://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists athttp://mail.ale.org/mailman/listinfo
>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>

-- 
-- 
James P. Kinney III

As long as the general population is passive, apathetic, diverted to
consumerism or hatred of the vulnerable, then the powerful can do as they
please, and those who survive will be left to contemplate the outcome.
- *2011 Noam Chomsky

http://heretothereideas.blogspot.com/
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20111130/f20a6b28/attachment-0001.html