[ale] read after write verify?, data scrubbing procedures

Phil Turmel philip at turmel.org
Fri Oct 26 09:16:33 EDT 2012


On 10/26/2012 01:00 AM, mike at trausch.us wrote:
> On 10/26/2012 12:01 AM, Phil Turmel wrote:

[trim /]

>> Not quite, as the metadata space on the member devices is not scrubbed.
> 
> While I hadn't realized that, the metadata space contains relatively 
> mutable data and, if memory serves, is added to all members of the array 
> and kept identical.  This means that any error in one of the members' 
> RAID superblocks would be correctable by reading the superblock from any 
> other member of the array.  For that matter, it'd immediately cause the 
> array to transition to the degraded or deceased states, depending on the 
> array's configuration.

The metadata is kept in kernel memory while the array runs, but is
written out quite often, as it tracks whether grouped writes are
consistent among the member devices, and, if using a bitmap, which ones.

The metadata is identical among the members, except for the small bit
that uniquely identifies each member's role in the array.

The catch that some people encounter is that some of the metadata space
is wasted, and never read or written.  If a URE develops in that area,
no amount of raid scrubbing will fix it, leaving the sysadmin scratching
their head.

[trim /]

> ... are inversely proportional to just how much you actually attempt to 
> protect your data from failure.  :-)  And being that I have backups in 
> place, I'm not terribly worried about that.  Drive fails?  Replace it. 
> Two drives fail?  Replace them.  Three or more drives fail?  Recover it. 
>   I get a much larger paycheck that week, then.

:-)  I'm self-employed.  I get a much *smaller* paycheck when I spend
too much time on this.

>> par2 is much better than md5sums, as it can reconstruct the bad spots
>> from the Reed-Solomon recovery files.
> 
> Interesting.  Though it looks like it wouldn't work for my applications 
> at the moment.  Something that can scale to, oh, something on the order 
> of two to four terabytes would be useful, though.  :-)

I find it works very well keeping archives of ISOs intact.  The larger
the files involved, the more convenient par2 becomes.

> I'll keep an eye on the third version of that spec, too.  Learn (about) 
> something new every day!
> 
>> Indeed, raid5 cannot be trusted with large devices.  But raid6*can*  be
>> trusted.  And is very resilient to UREs if the arrays are scrubbed
>> regularly.
> 
> Well, that depends.  The level of trust in each comes from the number of 
> drives.  For example, would you trust a bank of 24 drives to RAID 6? 
> Only if you're nuts, I suspect.

For near-line light-duty high-capacity storage, I would certainly set up
such a raid6.  Configuring 24 drives as 22 in raid6 w/ two hot spares
would be more robust that a pair of twelve-drive raid6 arrays concatenated.

Same capacity, higher unattended fault tolerance, but significantly
lower performance.  Everything is tradeoff.

> I'd use RAID 5 for a 3(2)-drive array.  I'd use RAID 6 up to probably 
> 7(5), tops.  If I needed to do anything more than that, I'd start 
> stacking RAID levels depending on the application's requirements.

I don't use raid5 at all nowadays.  Triple mirror on three devices is my
minimum setup.  Raid10,f3 or raid6 for anything larger.

> I dream of the day where I can not worry about the problem at all, and 
> think in terms of Storage Pods[0] (or Storage Pod-like-things).

$  :-)

Phil


More information about the Ale mailing list