[ale] Server fatal?

James P. Kinney III jkinney at localnetsolutions.com
Thu Jul 26 10:10:36 EDT 2007


It is certainly a memory problem. Whether the failure is RAM module or
board is harder o tell. It looks like (I could be WAY off base here)
that the error is isolated to a particular memory address region 
0bc6cdd0-0bc6ff0. If that range doesn't cross a physical module boundary
then the RAM is bad (most likely) or the socket has a fault (try the
tech fix of pull the module, air can out the socket and reseat the
module).

(The range is 197578192 to 197578736 which 544 bytes - I suspect a RAM
module failure near the DIMM0 socket)

If the error is over more than 1 physical module (it doesn't look like
it), the motherboard has the problem (most likely - can still be a
semi-conducting fiber touching the contacts somewhere, zinc whiskers on
board, etc.).

On Thu, 2007-07-26 at 09:39 -0400, Christopher Fowler wrote:
> I've got a server that yesterday locked up a few times.
> It was also showing strange behavior.  For example when we started
> tomcat and the servlets java complained about not being able to find 
> classes.  After a reboot it started just fine.
> 
> This morning I saw many of these in dmesg:
> 
> mm/memory.c:101: bad pmd 0bc6cdd0(0000002000000000).
> mm/memory.c:101: bad pmd 0bc6ce00(0000002000000000).
> mm/memory.c:101: bad pmd 0bc6ce10(0000000200000000).
> mm/memory.c:101: bad pmd 0bc6ce30(000000a800000000).
> mm/memory.c:101: bad pmd 0bc6ce60(0000002000000000).
> mm/memory.c:101: bad pmd 0bc6ceb0(0000000800000000).
> mm/memory.c:101: bad pmd 0bc6cec0(000000a000000000).
> mm/memory.c:101: bad pmd 0bc6cef0(0000008000000000).
> mm/memory.c:101: bad pmd 0bc6cf00(0000000200000000).
> mm/memory.c:101: bad pmd 0bc6cf20(0000008000000000).
> mm/memory.c:101: bad pmd 0bc6cf40(0000000800000000).
> mm/memory.c:101: bad pmd 0bc6cfe0(0000000200000000).
> mm/memory.c:101: bad pmd 0bc6cff0(0000000800000000).
> 
> I did a google search and I see many emails where people have seen these
> but no replies as to what could be the failure.  It may be too vague.
> 
> Is it memory or motherboard?  I'm not getting disk errors so I'm not
> going to point my finger at the disks.
> 
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
> 
-- 
James P. Kinney III          
CEO & Director of Engineering 
Local Net Solutions,LLC        
770-493-8244                    
http://www.localnetsolutions.com

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part




More information about the Ale mailing list