[ale] disk drive diagnostics nirvana - NOT - I have questions

Ron Frazier (ALE) atllinuxenthinfo at techstarship.com
Tue Oct 23 11:35:54 EDT 2012


Hi All,

Wow, talk about unpleasant morning surprises.

I spent a good part of last night until about 3 am working on this HDD.  
First I backed up my data on the 1 TB pair (which is the same on each) 
to a 3rd 1 TB drive that used to be attached to my DVR.  I then ran the 
Seatools tests on the drive which had the 120 bad sectors.  It passed 
just fine.  (Of course it isn't going to give me any proof of failure!)  
I then spent some more time researching and installing a Windows tool 
called CrystalDiskInfo (highly recommended) to monitor the SMART system 
and email me if the reallocated sectors climb.  (Maybe you can do 
something similar in Linux, not sure how.  Maybe CrystalDiskInfo will 
run in Wine.)

Boy did I get a surprise when I got up.  I found 8 messages in my email 
box from CrystalDiskInfo indicating changes in the drive status.  It 
turns out that, within a 1/2 hour period around 5 AM, the reallocated 
sectors climbed from 120 to 3957!  And, when I logged into Windows, I 
got a message from Windows saying the drive is failing and backup my data.

I guess I'll spend much of the afternoon zeroing this drive out for 
privacy and removing it.  I've already put in RMA requests with Seagate 
and they're sending me replacement drives and approved packaging.  I'm 
going to disconnect the other two 1 TB drives so I don't zero the wrong one.

I'm amazed how fast this deteriorated and am glad I was able to monitor 
it.  Gotta update my backups on all my machines and my Dad's.  The 
monitoring software is going on all my PC's from now own.  Somebody tell 
me how to do automatic monitoring and alerts on Linux.

Crystal Disk Info is monitoring the following SMART parameters with 
alarm thresholds that I can set.  They're normally set to alert at 1, 
but I set the reallocated sector count to alert at 121 and 13 to show if 
these drives degraded further.

05 Reallocated Sectors
C5 Current Pending Sector Count
C6 Uncorrectable Sector Count

I'll share more of what I learn later.

Sincerely,

Ron

On 10/23/2012 10:49 AM, Derek Atkins wrote:
> Phil Turmel<philip at turmel.org>  writes:
>
>    
>> My critical servers all use linux software raid in various combinations,
>> and all of the raid arrays are scrubbed weekly.  By scrubbed, I mean a
>> cron job instructs the kernel to read every sector on every member
>> device in the background, compute parity as appropriate, and report any
>> inconsistencies.  Any read errors trigger the corresponding recovery and
>> rewrite functions that would normally occur if an application
>> encountered the sectors.  Any unsuccessful write kicks that device out
>> of the array as usual.
>>      
> I thought that the raid scrub did a read/write of every sector, not just
> a read?
>
>    
>> I have been doing this for about ten years now, with about seven or
>> eight drive failures in that time.  Never lost any data, though I've
>> been nervous a few times when waiting for a replacement disk for a raid5
>> array.  Everything is now raid6 or triple-mirrored, so I sleep well.
>>      
> I use RAID-10 personally.
>
>    
>> All of the drives that failed on me had fewer than 100 relocated
>> sectors.  None of them had fewer than 20 relocated sectors.  Mostly
>> 30,000+ hours of operation.  This seems to correspond well to the
>> reports I read on the linux-raid mailing list.  I tolerate drives with
>> single-digit relocation counts, but I recheck them every week.  After
>> that, they're outa there.
>>      
> Agreed.
>
>    
>> Some of the research on the topic suggests that climbing relocation
>> counts is most often caused by approaching spindle bearing failure,
>> where the wobble causes head tracking errors.  Whatever the underlying
>> reason, that's my red line.
>>      
> I look for a new drive as soon as I get the first errors (assuming I
> don't happen to have a cold spare on hand).
>
>    
>> HTH,
>>
>> Phil
>>      
> -derek
>
>    

-- 

(To whom it may concern.  My email address has changed.  Replying to former
messages prior to 03/31/12 with my personal address will go to the wrong
address.  Please send all personal correspondence to the new address.)

(PS - If you email me and don't get a quick response, you might want to
call on the phone.  I get about 300 emails per day from alternate energy
mailing lists and such.  I don't always see new email messages very quickly.)

Ron Frazier
770-205-9422 (O)   Leave a message.
linuxdude AT techstarship.com



More information about the Ale mailing list