[ale] ext3-fs error (RH 3.4.6-2)

Lightner, Jeff JLightner at water.com
Wed Nov 30 13:25:44 EST 2011


A couple of things:
1)  You're not using RH 3.4.6-2 - the message tell you your kernel was copiled by that version of gcc.   To see the version of RH you're running do "cat /etc/issue" and/or "cat /etc/redhat-release".
2)  The way RedHat does things is it releases a base package from upstream then appends it own versioning to that so 2.6.9-42.ELsmp is NOT the same as 2.6.9 on any other system as it may have backported bug and security fixes in it.   (That being said kernel is handled differently than many other packages so you can actually get kernel updates from the RedHat yum repositories that might be newer than 2.6.9x.  You should NOT attempt to download and compile a newer kernel manually as it would no longer be RHEL supported at that point.

If you're using RHEL and paying a subscription fee you can call them for support.  If you're NOT paying for a subscription fee and using them for support you might want to consider moving to CentOS which is a binary compile of RHEL sources.  It doesn't require subscription fees but also doesn't have a support number.   (Of course you wouldn't want to worry about this until you've solved your base issue.)

My thought is as Mike said that it is likely an issue with the disk controller or disks themselves.





-----Original Message-----
From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of Michael B. Trausch
Sent: Wednesday, November 30, 2011 12:29 PM
To: Atlanta Linux Enthusiasts
Subject: Re: [ale] ext3-fs error (RH 3.4.6-2)

On Wed, Nov 30, 2011 at 12:00:29PM -0500, Rich Faulkner wrote:
>    Red Hat gurus....I'm working an issue with a RH 3.4.6-2 (kernel
>    2.6.9-42.ELsmp) that has suffered a couple of crashes.  /var/log/messages
>    include:

Not a RH guru, but fortunately, don't need to be since your problem is
with the Linux kernel.

>    Nov 25 15:24:02 localhost kernel: EXT3-fs error (device ):
>    ext3_get_group_desc: block_group >= groups_count - block_group = 3734,
>    groups_count = 0
>    Nov 25 15:24:02 localhost kernel: Unable to handle kernel NULL pointer
>    dereference at virtual address 0000003a

That's a pretty straightforward message there.

>    Nov 25 15:24:02 localhost kernel:  printing eip:
>    Nov 25 15:24:02 localhost kernel: f88d07f1
>    Nov 25 15:24:02 localhost kernel: *pde = 36c6c001
>    Nov 25 15:24:02 localhost kernel: Oops: 0000 [#1]
>    Nov 25 15:24:02 localhost kernel: SMP
>    Nov 25 15:24:02 localhost kernel: Modules linked in: mercd(U) ctimod(U)
>    parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod
>    button battery ac md5 ipv6 joydev uhci_hcd ehci_hcd e1000 sr_mod ext3 jbd
>    ata_piix libata aacraid sd_mod scsi_mod
>    Nov 25 15:24:02 localhost kernel: CPU:    0
>    Nov 25 15:24:02 localhost kernel: EIP:    0060:[<f88d07f1>]    Tainted:
>    PF     VLI
>    Nov 25 15:24:02 localhost kernel: EFLAGS: 00010202   (2.6.9-42.ELsmp)
>    Nov 25 15:24:02 localhost kernel: EIP is at ext3_handle_error+0x18/0x8b
>    [ext3]

This means that the Oops originated within the function
ext3_handle_error.  The offset provided is the binary offset into the
function; if this is a RH-compiled binary kernel, they should have
provided enough symbol information for you to be able to trace that
back to a line of code.

Without having my hands on the system, I'm limited to guessing (and
someone who reads Oopses more frequently than I may be of more
assistance) but it is likely to be one of the following:

  * Disk and/or controller problem.  Run badblocks and see what you
    find.  Look for any other messages that talk about problems with
    the disk, too, in your system log.

  * RAM problem (e.g., pointer was not what it should have been).  Try
    running a memtest on the system overnight and see what you get.

  * A bug in the ext3 filesystem in your kernel (which is really,
    really old and from back when ext3 was relatively new, so this
    would not surprise me at all).

Your kernel is *really* old, though.  I'd recommend upgrading it if
you can.  2.6.9 was the very beginning of the current "compatability
line" of the 2.6 series, and of course we're onto the 3.0 series these
days.  I'm not running any kernels that old (that kernel hails from
October, 2004!) on any production system that I have.

On some of my production systems I've been forced to use vanilla
kernels (which I don't mind, actually; it's pretty easy to build and
package for multiple systems) because there were bugfixes that I
needed that either never were backported by the distribution or were
taking too long and the issue was something that actually impacted me.

I do see that your kernel is in its 42nd revision, but you'll need to
actually look at the change history that RH provides you to see how
much of the current upstream ext3 code is actually present in your
kernel.  I'm willing to bet that they haven't backported all of the
fixes, though I wouldn't be willing to bet that this is absolutely the
cause of your problem here.

>    Nov 25 15:24:02 localhost kernel: eax: c3614f94   ebx: c3614e14   ecx:
>    f7c62e20   edx: 00000002
>    Nov 25 15:24:02 localhost kernel: esi: 00000000   edi: c3614e14   ebp:
>    00000000   esp: f7c62e10
>    Nov 25 15:24:02 localhost kernel: ds: 007b   es: 007b   ss: 0068
>    Nov 25 15:24:02 localhost kernel: Process sendmail (pid: 3047,
>    threadinfo=f7c62000 task=f727c270)

This means that sendmail was the process running when the Oops
occurred, and from the rest of this information it looks like it was
trying to find information on a file.

>    Nov 25 15:24:02 localhost kernel: Stack: f7c62e4c c3614e14 f88d089e
>    f88d8b3c f88d7d40 f7c62e4c f88d8b19 c3614f74
>    Nov 25 15:24:02 localhost kernel:        f88d7d83 00000e96 05ca0c7a
>    f88c702a c3614e14 f88d7d83 f88d7d40 00000e96
>    Nov 25 15:24:02 localhost kernel:        00000000 00000e96 00000e96
>    05ca0c7a 00001160 c3614e00 f88c830c 00000000
>    Nov 25 15:24:02 localhost kernel: Call Trace:
>    Nov 25 15:24:02 localhost kernel:  [<f88d089e>] ext3_error+0x3a/0x40
>    [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<f88c702a>]
>    ext3_get_group_desc+0x2a/0x85 [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<f88c830c>]
>    ext3_count_free_blocks+0x28/0x3e [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<f88d2b51>] ext3_statfs+0x9a/0x116
>    [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<c0159149>] vfs_statfs+0x41/0x59
>    Nov 25 15:24:02 localhost kernel:  [<c015916f>] vfs_statfs_native+0xe/0xd0
>    Nov 25 15:24:02 localhost kernel:  [<c01678e5>] __user_walk+0x4a/0x51
>    Nov 25 15:24:02 localhost kernel:  [<c0159298>] sys_statfs+0x3f/0x9f
>    Nov 25 15:24:02 localhost kernel:  [<c018a4f8>]
>    loadavg_read_proc+0x98/0xa0
>    Nov 25 15:24:02 localhost kernel:  [<c010b052>] do_gettimeofday+0x1a/0x9c
>    Nov 25 15:24:03 localhost kernel:  [<c012614f>] sys_time+0xf/0x58
>    Nov 25 15:24:03 localhost kernel:  [<c02d4703>] syscall_call+0x7/0xb
>    Nov 25 15:24:03 localhost kernel: Code: d4 20 85 c7 80 4b 18 04 83 c4 10
>    83 c4 10 5b 5e 5f 5d c3 56 53 89 c3 8b 80 80 01 00 00 0f b7 50 40 8b 70 2c
>    83 ca 02 66 89 50 40 <0f> b7 46 3a 83 c8 02 66 89 46 3a f6 43 34 01 75 5f
>    8b 8b 80 01
>    Nov 25 15:24:03 localhost kernel:  <0>Fatal exception: panic in 5 seconds
>
>    Thoughts?
>
>    Many thanks while I otherwise research this online....RinL

HTH.

        --- Mike




Athena(r), Created for the Cause(tm)
Making a Difference in the Fight Against Breast Cancer

---------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.
----------------------------------




More information about the Ale mailing list