[ale] Seagate 1.5TB drives, bad blocks, md raid, lvm, and hard lock-ups

Thu Jan 7 12:19:10 EST 2010

On Wed, Jan 6, 2010 at 4:38 PM, Greg Freemyer <greg.freemyer at gmail.com> wrote:
> Brian,
>
> If your running raid5 with those drives, you have basically zero fault
> tolerance.
>
> The issue is if one drive totally fails you are almost guaranteed to
> have some bad sectors on the remaining drives.  Those bad sectors will
> prevent mdraid from rebuilding the array fully.  (It may rebuild the
> other stripes that don't have any bad sectors, but definitely the
> stripes that have bad sectors are not rebuildable.).
>
> So at a minimum you need to be running mdraid raid6.  And even then
> you will just achieve raid5 reliability.  (ie. with that size drives
> mdraid raid6 will likely NOT survive a double disk failure.)
>
> And then you need to be running background scans on a routine basis.
> I've forgotten the exact command, but you can tell mdraid to scan the
> entire raid volume and verify the parity info is right.  In theory
> doing that will handle the bad sectors as they pop up.
>
> Unfortunately it sounds like your drives are creating bad sectors
> faster than you can likely force them to be remapped even by routine
> background scans.
>
> Greg
>
> On Wed, Jan 6, 2010 at 3:09 PM, Brian W. Neu <ale at advancedopen.com> wrote:
>> I have a graphic design client with a 2U server running Fedora 11 and now 12
>> which is at a colo handling their backups.  The server has 8 drives with
>> Linux md raids & LVM on top of them.  The primary filesystems are ext4 and
>> there is/was an LVM swap space.
>>
>> I've had an absolutely awful experience with these Seagate 1.5 TB drives,
>> returning 10 out of the original 14 due to the ever increasing SMART
>> "Reallocated_Sector_Ct" due to bad blocks.  The server that the client has
>> at their office has a 3ware 9650(I think) that has done a great job of
>> handling the bad blocks from this same batch of drives and sending email
>> notifications of one of the drives that grew more and more bad blocks.  This
>> 2U though is obviously pure software raid, and it has started locking up.
>>
>> As a stabilizing measure, I've disable the swap space, hoping the lockups
>> were caused by failure to read/write from swap.  I have yet to let the
>> server run over time and assess if this was successful.
>>
>> However, I'm doing a lot of reading today on how md & LVM handle bad blocks
>> and I'm really shocked.  I found this article (which may be outdated) which
>> claimed that md relies heavily on the firmware of the disk to handle these
>> problems and when rebuilding an array there are no "common sense" integrity
>> checks to assure that the right data is reincorporated back into the healthy
>> array.  Then I've read more and more articles about drives that were
>> silently corrupting data.  It's turned my stomach.  Btrfs isn't ready for a
>> this, even though RAID5 was very recently incorporated.  And I don't see
>> btrfs becoming a production stable file system until 2011 at the earliest.
>>
>> Am I totally wrong about suspecting bad blocks for causing the lock-ups?
>> (syslog records nothing)
>> Can md RAID be trusted with flaky drives?
>> If it's the drives, then other than installing OpenSolaris and ZFS, how to I
>> make this server reliable?
>> Any experiences with defeating mysterious lock-ups?

This post on md is somewhat disconcerting, and made me question
whether I should be using MD at all:

http://utcc.utoronto.ca/~cks/space/blog/linux/SoftwareRaidFail

- Ryan
--
http://prefetch.net