[ale] Box Starting to Die?

Glenn C. Lasher Jr. critter at wizvax.net
Fri Sep 1 07:07:20 EDT 2000



My experience is that if you take a machine with a HDD that has been
running mostly nonstop and you prepare to move it, you should bring it
back up as fast as humanly possible to avoid damaging the HDD.  The drive
gets 'used' to running, and stopping it will lead to problems in the
bearings.  These difficulties could modulate the rotation of the platters
enough to mess up your data from time to time.

On Thu, 31 Aug 2000, Jeff Hubbs wrote:

> I have a trusty Linux box at work that's getting to be not-so-trusty anymore
> after having been moved - lockups, inability to telnet in, etc.  
> 
> /var/log/messages hints at some reasons why over the past few days.  Here
> are two recent examples from last night:
> 
> 	Aug 30 22:01:00 jupiter kernel: int3: 0000 
> 	Aug 30 22:01:00 jupiter kernel: CPU:    0 
> 	Aug 30 22:01:00 jupiter kernel: EIP:    0010:[<c050f756>] 
> 	Aug 30 22:01:00 jupiter kernel: EFLAGS: 00000286 
> 	Aug 30 22:01:00 jupiter kernel: eax: c0c55da0   ebx: c1462000   ecx:
> 0804ffcc   edx: c0c55da0 
> 	Aug 30 22:01:00 jupiter kernel: esi: 0804ffcc   edi: 00000698   ebp:
> c1462000   esp: c1463fa0 
> 	Aug 30 22:01:00 jupiter kernel: ds: 0018   es: 0018   ss: 0018 
> 	Aug 30 22:01:00 jupiter kernel: Process crond (pid: 899, process nr:
> 47, stackpage=c1463000) 
> 	Aug 30 22:01:00 jupiter kernel: Stack: c1462000 0804ffc8 00000698
> bffff860 c1462000 c0835658 c01096ad c1463fc4  
> 	Aug 30 22:01:00 jupiter kernel:        00000007 401021b4 00000071
> 00000628 0804ffc8 00000698 bffff860 08050038  
> 	Aug 30 22:01:00 jupiter kernel:        0000002b 0000002b ffffffff
> 4006fd6c 00000023 00010206 bffff814 0000002b  
> 	Aug 30 22:01:00 jupiter kernel: Call Trace: [error_code+45/52]  
> 	Aug 30 22:01:00 jupiter kernel: Code: cc cc cc cc cc cc cc cc cc cc
> 55 8b ec 83 ec 0c 83 7d 08 00  
> 	Aug 30 23:16:54 jupiter kernel: invalid operand: 0000 
> 	Aug 30 23:16:54 jupiter kernel: CPU:    0 
> 	Aug 30 23:16:54 jupiter kernel: EIP:    0010:[<c14c8030>] 
> 	Aug 30 23:16:54 jupiter kernel: EFLAGS: 00010212 
> 	Aug 30 23:16:54 jupiter kernel: eax: c14c8001   ebx: c0835340   ecx:
> 0804e264   edx: c16785c0 
> 	Aug 30 23:16:54 jupiter kernel: esi: 0804e264   edi: c14c8000   ebp:
> 0804e208   esp: c14c9fbc 
> 	Aug 30 23:16:54 jupiter kernel: ds: 0018   es: 0018   ss: 0018 
> 	Aug 30 23:16:54 jupiter kernel: Process crond (pid: 915, process nr:
> 49, stackpage=c14c9000) 
> 	Aug 30 23:16:54 jupiter kernel: Stack: c14c9fc4 00000007 401021b4
> 401005d8 00000050 0804e208 00000058 bffff860  
> 	Aug 30 23:16:54 jupiter kernel:        401005d8 0000002b 0000002b
> ffffffff 4006fc6a 00000023 00010202 bffff814  
> 	Aug 30 23:16:54 jupiter kernel:        0000002b  
> 	Aug 30 23:16:54 jupiter kernel: Call Trace:  
> 	Aug 30 23:16:54 jupiter kernel: Code: ff ff ff ff 00 60 1f c0 00 40
> 71 c1 00 40 71 c1 00 20 d8 c1 
> 
> That last one appears to have locked up the machine.
> 
> Earlier, I see some instances of "Unable to handle kernel paging request at
> virtual address...", "invalid operand: 0000", and "free_one_pmd: bad
> directory entry 01000000".
> 
> My experience to date suggests that these are the signs of a disk drive
> that's in the process of going south.  I perhaps should mention that this
> machine had an uptime of 142 days up to the point where it was moved from
> one building to another and that this is a somewhat old 2GB drive.  Would
> you agree or have any other suggestions?
> 
> - Jeff 
> --
> To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
> 

-- 
Critter at Wizvax.Net
Don't Steal - The government hates competition.


--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.





More information about the Ale mailing list