[ale] new (to me) raid 4/5 failure mode

Pat Regan thehead at patshead.com
Mon Aug 24 15:55:15 EDT 2009


Greg Freemyer wrote:
>> If your application isn't calling fsync then the data must not be that
>> important :).
> 
> fsync does not address this issue, which is a small number of
> milliseconds of vulnerability for mos. disk write.

If fsync doesn't address the issue there is either a hardware or driver
problem.  fsync is not supposed to return until the disk hardware has
flushed the write to the platter.

I've heard rumors of inexpensive disks cheating on this.  I have no
proof of this and I'm not convinced it is true :).

>> d1 was garbage the moment it dropped out of the array :).
> 
> But it is recreatable from d2 and p, or from d2' and p'.
> 
> 
> It is not recreatable from d2 and p'  or from d2' and p.    And if you
> have system shutdown during that window of vulnerability, that is all
> you will have to work with.

If all your hardware and drivers work correctly this state can never be
reached for data that has been fsynced.

Now, for data that has not been fsynced it is an entirely different story.

>> Once a disk drops out of an array I would expect all data on the drive
>> to bad.
> 
> But you expect it to be recreatable from the other drives in the
> raidset.  Or at least I do.

I do.  I fully expect my disks and controllers to honor any fsync calls.
 You can't have any true atomicity without a fully working fsync.

> Raid 5 fully operational will not cause data on one drive to be lost
> because a write is going on to another drive and you have an
> unexpected shutdown.
> 

The array might be clean, but your data won't be.  You need at least n-1
disks in a stripe to be in sync for a stripe to be valid.  If you have 4
disks and only 2 of them were flushed before power went out you will
have an unrecoverable stripe.

Its very similar to losing power on a single disk mid-write.

I think all I'm saying is that there isn't a terribly huge difference in
data loss between a single disk or any level of RAID during power loss.
 No matter what you may have data in a half written state.

> Yeah, like I said, I'm not a raid 5 fan at all.  raid 6 I think has
> its place, but probably on in arrays with lots of drives so that you
> can do more work in parallel.  I've pretty much given up on raid 5,
> and this failure mode is just one more nail in the coffin for me.

It doesn't matter how parallel your work load.  RAID 5 and 6 will always
be slow for writes.

Also, the failure mode you are worried about can happen with RAID 6 as
well.  You don't even need a failed drive.  In a single disk power
failure you lose what is in cache.  In a RAID 5 you can potentially lose
any stripe that is in cache in 2 or more drive.  RAID 6 would be just
like 5 except add one drive.

I notice this a lot on one of my home media server machine.  I've been
adding disks to it for a while.  Root/boot started on a small RAID 1 on
the first 8 gig of each disk, the rest of the disk is part of a big RAID
5.  As I've added disks, the easiest thing to do was to add more mirrors
to the RAID 1.

There's 6 disks in the machine, but the root/boot is "only" mirrored 4
times.  Sometimes if there is a power blip or the machine locks up (I
had a cpu problem a while back), one of the mirrors in the 4-way RAID 1
will be inconsistent.  It seems to be enough to confuse MD and force me
to manually drop one of the mirrors or the MD won't start back up.

If I knew I was going to have this conversation I may have paid
attention to see if 2 disks ever go missing at the same time.  I want to
say this has happened at least once, but I'm not entirely certain.

>> Pat
> 
> Greg
> 

Pat

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
Url : http://mail.ale.org/pipermail/ale/attachments/20090824/4c8b8d43/attachment.bin 


More information about the Ale mailing list