[ale] The perpetual question: best current HDD?

Wed Jan 9 23:30:59 EST 2013

On Wed, Jan 9, 2013 at 11:33 AM, Derek Atkins <warlord at mit.edu> wrote:
> "Ron Frazier (ALE)" <atllinuxenthinfo at techstarship.com> writes:
>
>> Kinda like asking what's the best car.
>>
>> Anyway, I've always liked Seagate.  Still look for 5 yr warranties.  They're getting harder to find but some are still out there.  Blast the new drives before putting them into service with a SpinRite Level 4 test several times or a badblocks DESTRUCTIVE write test several times.  Each cycle through badblocks writes then reads 0000, 1010, 0101, 1111.  Each cycle takes about 3 days on a 1 TB drive.  I'd run at least 2 cycles.  More or less comparable SpinRite activity would be 6 - 8 repeats through the entire drive.  SpinRite is non destructive.  Note that the badblocks NONDESTRUCTIVE read write test can be run with data on the drive.  In this case, badblocks reads the data, writes a random value, reads it, then rewrites the original data, which is Similar to what SpinRite does.  If using badblocks NONDESTRUCTIVE read write test, I would run 6 - 8 passes.
>
> Yep, I always run "badlocks -w" on my drives prior to putting them in
> service.  I've also modified unraid's preclear_disk to also work hard on
> the disk and I use that, too.
>
>> Based on other discussions here, I would recommend doing background
>> data scrubbing on the RAID array to force each drive to read every
>> sector once or twice a year.  Read / write testing is even better.
>> You can manually do this a couple of times per year with Spinrite or
>> Badblocks.  Routine file systems checks are a good idea too.
>
> Fedora has a background cron job to do this:
> /etc/cron.weekly/99-raid-check
> You just need to enable /etc/sysconfig/raid-check
>
> Of course I only just now checked my mismatch_cnt's on my md devices and
> see:
>
> [root at vmhost ~]# cat /sys/block/md*/md/mismatch_cnt
> 128
> 195200
> 4224
> 139392
>
> ....  So not sure what to do now :-/
> They are each RAID-1 devices, combined into RAID-10 using LVM.

There is a potential issue with raid check, at least on the older
ones.  They are basically useless for any type of mirrored raid,
except for finding out if it's bad.  The problem is that when you have
2 disks that are mirrored, and the data on them doesn't match, which
one wins?  There's no way to know.  RAID5 can use its parity info to
figure it out an rebuild, but you can't do this with a simple mirror.

By the time you find out you have this kind of bad block, it's too
late.  So something like badblocks doing a full surface scan and
smartd to warn you about relocated sectors can help you avoid getting
to that point.

>> I would also recommend using gsmartcontrol to turn on all the smart
>> monitoring that is available on the drive.  Check that all the smart
>> stats are good before putting it on line, after stress testing it,
>> particularly reallocated sectors.  Set up a way to monitor the smart
>> parameters on an ongoing basis and receive alarms if they get out of
>> line.  It wouldn't be a bad idea to monitor temperature too.  Drives
>> cannot take as much heat as CPU's in general.  I think they start
>> getting unhappy around 50 deg C.
>
> Yeah, I have smartd turned on for all the drives and set to email me on
> major issues.  They run a short test every night, and a long test every
> week.  And of course logwatch sends out interesting stuff every night,
> too.
>
> -derek

❧ Brian Mathis