[ale] Something I thought I'd never see

Jeff Lightner jlightner at water.com
Wed Oct 11 09:31:56 EDT 2006


uptime
 09:24:26  up 38 days, 17:22,  8 users,  load average: 37.79, 37.66,
37.56

Is there a way to see exactly what is in the run queue?

I actually figured out what was causing this by tracing back through
Nagios alerts to the day it started alerting on load average then
looking for processes that day.  I found there is a directory that can't
be accessed and any process that attempts to access it hangs.  Backups,
cron cleanups and updatedb all tried and it was the fact that these
continued to kick off over the weekend that caused the load average to
increase.   Today I'm going to reboot since the processes can't be
killed due to whatever is locking this.   I'll also do an fsck to make
sure the filesystem in question doesn't have issues.

However I'm wondering how I might have figured this out if I hadn't been
able to narrow down the day except by running ps -ef and looking for
oddities such as the ones I found?   This prompted the question above.
I often see what appear to me to be abnormally high load averages (as
compared to what I'd think reasonable on the UNIX boxes I've worked on)
but they don't seem to actually impact performance overall.   

No high I/O waits.  No high cumulative CPU times.   No high memory
utilization.   Just these multiple processes all hung on the same
directory (as verified by lsof).



Jeffrey C. Lightner
Unix Systems Administrator
DS Waters of America, LP
678-486-3516


-------------- next part --------------
An HTML attachment was scrubbed...




More information about the Ale mailing list