[ale] Debugging lockups on a linux system?

Mike Harrison meuon at geeklabs.com
Mon Nov 28 08:46:12 EST 2005


On Mon, 28 Nov 2005, William Fragakis wrote:
> You may want to make sure it's not hardware related ie dying/funky 
> power supply, bad RAM, dying/dead case fans, etc.

> On Nov 28, 2005, at 7:33 AM, tom sawyer wrote:
> >  I'm having a problem.? I have a couple of linux boxes that I support 
> > at a client site.? The problem is that one of them keeps locking up 
> > for no apparent reason.? All I have is SSH access.? I have to have 
> > them reboot the device so that I can SSH back into it.? Nothing shows 
> > up in /var/log/messages when it locks up.? The next thing I see is the 

I agree with William, the only time I've seen a Linux boxen die hard
is hardware/power supply/ram/cpu problems. With the exception of
off things happening to I/O or a bad SWAP file/partition when the machine 
hits swap.

Which leads to: Next time it locks up, note the exact time.. 
slowly and carefully unplug all the IO's. about a minute apart. 
Keyboard/A20, Ethernet.. etc.. Then give the machine a few minutes
to free cycles up, log.. before you do a power off recycle. 
Then check the log again. If there are entries after you started
unplugging things, it's a clue that something is I/O bound, 
like a DDOS against the ethernet or a fubared web-app or a bad keyboard. 

Sometimes a *nix machine will CRAWL under load.. but will free up
when you remove the load.

If not... hardware. And if it's a RH 7.1 machine, it may be time
for a complete upgrade anyway... 










More information about the Ale mailing list