[ale] checking for process in uninterruptable sleep state

Todor Fassl fassl.tod at gmail.com
Wed May 11 13:47:59 EDT 2016


Doh!  I was fiddling with it after trying it out and forgot to put it 
back.  Yeah, it doesn't work the way I posted it. Well, anyone googling 
this problem in the future will have to take note of my dumbassness. 
Sorry all you future googlers.




On 05/10/2016 07:26 PM, Scott Plante wrote:
> I'm glad that worked out for you. I don't know why you couldn't get the script to reboot--it might have had more to do with cron than being in a script. Cron doesn't get your .profile, etc. so PATH and other environment missing can trip you up. But I assume you meant it wouldn't reboot even when running the script directly.
>
>
> I notice the ps command got the "-p" parameter changed into "-e" somehow. The -p pid looks for that particular process id. But -e says show every process, and the exit code is always true (0). I don't see how it'll work with the -e. I guess that's why you reversed the sense of the condition. The idea is the lsusb will run within a second or so and the process will be gone if everything is ok. If, after the timeout, the process is still there, then it's hung and you need the reboot or whatever. I understand it's a bit hard to test if you can't cause the usb to hang when you want it to.
>
>
>
>
> The normal idiom in bash to use a variable or a default if the variable isn't set is ${parameter:-word}. So you could replace your two "test" statements at the top with this:
>
>
> TIMEOUT=${1:-7}
>
>
> You can get the details for this in the bash man page under "Parameter Expansion". If you're going to do much bash scripting, there are a bunch of super useful things in that section.
>
>
> Most sources say the convention is to use all caps just for environment variables in bash, and lower case for local variables. Usually the first variables people see when learning are environment variables like HOME and PATH and they get the idea all variables are meant to be all caps. It's just convention though (and not universally applied) but it's something to consider. It'll help keep you from accidentally hiding some environment variable that you end up wanting to use down the road.
>
>
> Scott
>
> ----- Original Message -----
>
> From: "Todor Fassl" <fassl.tod at gmail.com>
> To: "Atlanta Linux Enthusiasts" <ale at ale.org>
> Sent: Tuesday, May 10, 2016 3:14:39 PM
> Subject: Re: [ale] checking for process in uninterruptable sleep state
>
> That worked fine. Thanks. I actually wrote the script on a workstation
> with a wedged usb sub-system at the time. So I could be sure the script
> worked as intended. I couldn't get it to reboot though. I don't know if
> you can't put a reboot command in a bash script or if it didn't work
> because of the original problem. Probably safer this way anyhow. I put
> it in cron on the 15 workstations because it generates a line of output
> if a machine is hung and cron itself will email me. For completeness,
> here is the full script:
>
> #!/bin/bash
> test ! -z "$1" && TIMEOUT=$1
> test -z "$TIMEOUT" && TIMEOUT=7
> ( /usr/bin/lsusb > /dev/null ) & PID=$!
> sleep $TIMEOUT
> if ps -e $PID > /dev/null 2>&1; then
> test ! -z "$VERBOSE" && echo "$HOSTNAME is okay."
> else
> echo "$HOSTNAME is hung, process $PID" >&2
> fi
>
>
> #!/bin/bash
>
> test ! -z "$1" && TIMEOUT=$1
> test -z "$TIMEOUT" && TIMEOUT=7
> ( /usr/bin/lsusb > /dev/null ) & PID=$!
> sleep $TIMEOUT
> if ps -e $PID > /dev/null 2>&1; then
> test ! -z "$VERBOSE" && echo "$HOSTNAME is okay."
> else
> echo "$HOSTNAME is hung, process $PID" >&2
> fi
>
> On 05/10/2016 10:21 AM, Scott Plante wrote:
>>
>> How about if you do lsusb in the background then check the PID to see if it's still running/stuck, like:
>>
>>
>> lsusb >/dev/null 2>&1 &
>> usbpid=$!
>> sleep 4 #or however long > max lsusb exec time
>> if ps -p $usbpid >/dev/null 2>&1
>> then
>> #lsusb is hung--do your stuff here, reboot etc.
>> fi
>> ----- Original Message -----
>>
>> From: "Todor Fassl" <fassl.tod at gmail.com>
>> To: "Atlanta Linux Enthusiasts" <ale at ale.org>
>> Sent: Tuesday, May 10, 2016 10:39:26 AM
>> Subject: [ale] checking for process in uninterruptable sleep state
>>
>> Okay, so my latest problem with these lab workstations is that accessing
>> the usb sub-system puts the calling process into an uninterruptable
>> sleep. I'd like to write a script to check for that so at least I'd know
>> that I have to go over and reboot the machine.
>>
>> Details: I have 15 Dell workstations running ubuntu 15.10 (2 are running
>> 16.04 -- that did not help). Occasionally, the keyboard and mouse
>> freeze. Logging in remotely and running lsusb hangs such that you can't
>> even control-c outand it cannot be killed even with a -9. The process
>> goes into an uninterruptable sleep during a system call to open the file
>> /sys/bus/usb/devices/usb1/descriptors. That file is part of the kernel's
>> control files for the usb controller itself. So you can see why the
>> keyboard and mouse are dead, the driver for the usb controller itself is
>> hung.
>>
>> We've upgraded the kernel and installed Dell's latest bios upgrades. No
>> joy. I am thinking the only remaining thing to do is to file a bug
>> report. However, I could eleaviate the problem a little if I could
>> easily detect it and reboot.
>>
>> The problem is that I can't figure out how to write a script to detect a
>> process in a uninterruptable sleep state. No matter what I do,it seems
>> to hang. I've tried something like "bash -c "lsusb' and 'timeout 5
>> lsusb'. They both hang. The only thing I've been able to do is to have
>> 2 different scripts. One running lsusb and another checking for blocked
>> lsusb procs. But that is way ugly.
>>
>> PS: I wouldn't mind ideas wrt the original problem either. Not that I
>> hold out any hope for that.
>>
>>
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>

-- 
Todd


More information about the Ale mailing list