[ale] ext3-fs error (RH 3.4.6-2)

Michael B. Trausch mike at trausch.us
Wed Nov 30 12:28:44 EST 2011


On Wed, Nov 30, 2011 at 12:00:29PM -0500, Rich Faulkner wrote:
>    Red Hat gurus....I'm working an issue with a RH 3.4.6-2 (kernel
>    2.6.9-42.ELsmp) that has suffered a couple of crashes.  /var/log/messages
>    include:

Not a RH guru, but fortunately, don't need to be since your problem is
with the Linux kernel.

>    Nov 25 15:24:02 localhost kernel: EXT3-fs error (device ):
>    ext3_get_group_desc: block_group >= groups_count - block_group = 3734,
>    groups_count = 0
>    Nov 25 15:24:02 localhost kernel: Unable to handle kernel NULL pointer
>    dereference at virtual address 0000003a

That's a pretty straightforward message there.

>    Nov 25 15:24:02 localhost kernel:  printing eip:
>    Nov 25 15:24:02 localhost kernel: f88d07f1
>    Nov 25 15:24:02 localhost kernel: *pde = 36c6c001
>    Nov 25 15:24:02 localhost kernel: Oops: 0000 [#1]
>    Nov 25 15:24:02 localhost kernel: SMP
>    Nov 25 15:24:02 localhost kernel: Modules linked in: mercd(U) ctimod(U)
>    parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod
>    button battery ac md5 ipv6 joydev uhci_hcd ehci_hcd e1000 sr_mod ext3 jbd
>    ata_piix libata aacraid sd_mod scsi_mod
>    Nov 25 15:24:02 localhost kernel: CPU:    0
>    Nov 25 15:24:02 localhost kernel: EIP:    0060:[<f88d07f1>]    Tainted:
>    PF     VLI
>    Nov 25 15:24:02 localhost kernel: EFLAGS: 00010202   (2.6.9-42.ELsmp)
>    Nov 25 15:24:02 localhost kernel: EIP is at ext3_handle_error+0x18/0x8b
>    [ext3]

This means that the Oops originated within the function
ext3_handle_error.  The offset provided is the binary offset into the
function; if this is a RH-compiled binary kernel, they should have
provided enough symbol information for you to be able to trace that
back to a line of code.

Without having my hands on the system, I'm limited to guessing (and
someone who reads Oopses more frequently than I may be of more
assistance) but it is likely to be one of the following:

  * Disk and/or controller problem.  Run badblocks and see what you
    find.  Look for any other messages that talk about problems with
    the disk, too, in your system log.

  * RAM problem (e.g., pointer was not what it should have been).  Try
    running a memtest on the system overnight and see what you get.

  * A bug in the ext3 filesystem in your kernel (which is really,
    really old and from back when ext3 was relatively new, so this
    would not surprise me at all).

Your kernel is *really* old, though.  I'd recommend upgrading it if
you can.  2.6.9 was the very beginning of the current "compatability
line" of the 2.6 series, and of course we're onto the 3.0 series these
days.  I'm not running any kernels that old (that kernel hails from
October, 2004!) on any production system that I have.

On some of my production systems I've been forced to use vanilla
kernels (which I don't mind, actually; it's pretty easy to build and
package for multiple systems) because there were bugfixes that I
needed that either never were backported by the distribution or were
taking too long and the issue was something that actually impacted me.

I do see that your kernel is in its 42nd revision, but you'll need to
actually look at the change history that RH provides you to see how
much of the current upstream ext3 code is actually present in your
kernel.  I'm willing to bet that they haven't backported all of the
fixes, though I wouldn't be willing to bet that this is absolutely the
cause of your problem here.

>    Nov 25 15:24:02 localhost kernel: eax: c3614f94   ebx: c3614e14   ecx:
>    f7c62e20   edx: 00000002
>    Nov 25 15:24:02 localhost kernel: esi: 00000000   edi: c3614e14   ebp:
>    00000000   esp: f7c62e10
>    Nov 25 15:24:02 localhost kernel: ds: 007b   es: 007b   ss: 0068
>    Nov 25 15:24:02 localhost kernel: Process sendmail (pid: 3047,
>    threadinfo=f7c62000 task=f727c270)

This means that sendmail was the process running when the Oops
occurred, and from the rest of this information it looks like it was
trying to find information on a file.

>    Nov 25 15:24:02 localhost kernel: Stack: f7c62e4c c3614e14 f88d089e
>    f88d8b3c f88d7d40 f7c62e4c f88d8b19 c3614f74
>    Nov 25 15:24:02 localhost kernel:        f88d7d83 00000e96 05ca0c7a
>    f88c702a c3614e14 f88d7d83 f88d7d40 00000e96
>    Nov 25 15:24:02 localhost kernel:        00000000 00000e96 00000e96
>    05ca0c7a 00001160 c3614e00 f88c830c 00000000
>    Nov 25 15:24:02 localhost kernel: Call Trace:
>    Nov 25 15:24:02 localhost kernel:  [<f88d089e>] ext3_error+0x3a/0x40
>    [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<f88c702a>]
>    ext3_get_group_desc+0x2a/0x85 [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<f88c830c>]
>    ext3_count_free_blocks+0x28/0x3e [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<f88d2b51>] ext3_statfs+0x9a/0x116
>    [ext3]
>    Nov 25 15:24:02 localhost kernel:  [<c0159149>] vfs_statfs+0x41/0x59
>    Nov 25 15:24:02 localhost kernel:  [<c015916f>] vfs_statfs_native+0xe/0xd0
>    Nov 25 15:24:02 localhost kernel:  [<c01678e5>] __user_walk+0x4a/0x51
>    Nov 25 15:24:02 localhost kernel:  [<c0159298>] sys_statfs+0x3f/0x9f
>    Nov 25 15:24:02 localhost kernel:  [<c018a4f8>]
>    loadavg_read_proc+0x98/0xa0
>    Nov 25 15:24:02 localhost kernel:  [<c010b052>] do_gettimeofday+0x1a/0x9c
>    Nov 25 15:24:03 localhost kernel:  [<c012614f>] sys_time+0xf/0x58
>    Nov 25 15:24:03 localhost kernel:  [<c02d4703>] syscall_call+0x7/0xb
>    Nov 25 15:24:03 localhost kernel: Code: d4 20 85 c7 80 4b 18 04 83 c4 10
>    83 c4 10 5b 5e 5f 5d c3 56 53 89 c3 8b 80 80 01 00 00 0f b7 50 40 8b 70 2c
>    83 ca 02 66 89 50 40 <0f> b7 46 3a 83 c8 02 66 89 46 3a f6 43 34 01 75 5f
>    8b 8b 80 01
>    Nov 25 15:24:03 localhost kernel:  <0>Fatal exception: panic in 5 seconds
>
>    Thoughts?
>
>    Many thanks while I otherwise research this online....RinL

HTH.

	--- Mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 665 bytes
Desc: not available
Url : http://mail.ale.org/pipermail/ale/attachments/20111130/e2fce809/attachment-0001.bin 


More information about the Ale mailing list