[ale] read after write verify?, data scrubbing procedures

mike at trausch.us mike at trausch.us
Fri Oct 26 01:00:18 EDT 2012


On 10/26/2012 12:01 AM, Phil Turmel wrote:
> On 10/25/2012 11:25 PM, mike at trausch.us wrote:
>> On 10/25/2012 04:43 PM, Ron Frazier (ALE) wrote:
>>> Then, there is the question of the data scrubbing on the source
>>> drive.  In this case, once I've completed a backup, I will have read
>>> each sector on the source drive.  Assuming there are no read errors,
>>> (If there were, I have to get out the big guns.)  then, this has
>>> accomplished 1/2 of what my scrubbing does, the read half.
>>
>> This is only true if you always back up every single block on your
>> device.  The Linux RAID self-scrubbing process, however, will read every
>> single block on every single drive in the RAID unit, whether the block
>> is in use or not.
>
> Not quite, as the metadata space on the member devices is not scrubbed.

While I hadn't realized that, the metadata space contains relatively 
mutable data and, if memory serves, is added to all members of the array 
and kept identical.  This means that any error in one of the members' 
RAID superblocks would be correctable by reading the superblock from any 
other member of the array.  For that matter, it'd immediately cause the 
array to transition to the degraded or deceased states, depending on the 
array's configuration.

It seems that the metadata area is read from and written to at least:

   * When the device is assembled (twice, it seems).
   * When the device has a component added to or removed from it.
   * When the device transitions to recovery or reshape states.

So, I suppose if you want to ensure that the RAID superblocks are read 
once per week, you have to reboot.  :-)  Though that said, the flip side 
of the coin is that since those areas are pretty much left alone, and 
aren't critical to the operation of an array that can tolerate a failed 
drive, weekly scrubbing is likely to catch any errors.  After all, while 
I thought 100.0000000% of the drive was scrubbed before, we're saying 
now that we're only scrubbing approximately 99.9999186% of the drive.

Or, put another way and based on my own observations, the odds that:

   * 0.00008138% of the disk are unrecoverable, AND
   * That part of the disk contains critical, need-it-in-ten-minutes
     type data...

... are inversely proportional to just how much you actually attempt to 
protect your data from failure.  :-)  And being that I have backups in 
place, I'm not terribly worried about that.  Drive fails?  Replace it. 
Two drives fail?  Replace them.  Three or more drives fail?  Recover it. 
  I get a much larger paycheck that week, then.

> par2 is much better than md5sums, as it can reconstruct the bad spots
> from the Reed-Solomon recovery files.

Interesting.  Though it looks like it wouldn't work for my applications 
at the moment.  Something that can scale to, oh, something on the order 
of two to four terabytes would be useful, though.  :-)

I'll keep an eye on the third version of that spec, too.  Learn (about) 
something new every day!

> Indeed, raid5 cannot be trusted with large devices.  But raid6*can*  be
> trusted.  And is very resilient to UREs if the arrays are scrubbed
> regularly.

Well, that depends.  The level of trust in each comes from the number of 
drives.  For example, would you trust a bank of 24 drives to RAID 6? 
Only if you're nuts, I suspect.

I'd use RAID 5 for a 3(2)-drive array.  I'd use RAID 6 up to probably 
7(5), tops.  If I needed to do anything more than that, I'd start 
stacking RAID levels depending on the application's requirements.

I dream of the day where I can not worry about the problem at all, and 
think in terms of Storage Pods[0] (or Storage Pod-like-things).

	--- Mike

[0] 
http://www.extremetech.com/computing/90634-how-to-build-your-own-135tb-raid6-storage-pod-for-7384

-- 
A man who reasons deliberately, manages it better after studying Logic
than he could before, if he is sincere about it and has common sense.
                                    --- Carveth Read, “Logic”


More information about the Ale mailing list