[ale] Which large capacity drives are you having the best luck with?

Thu Jan 6 10:57:51 EST 2011

On Thu, Jan 6, 2011 at 1:51 AM, Ron Frazier
<atllinuxenthinfo at c3energy.com> wrote:
<snip>
>>
>> But I don't think spinrite tries to detect sectors that have bee ECC
>> recovered.  So it doesn't really know the details.
>>
>> A smart long self test has the ability to know that a ECC recovery is
>> needed for a sector.  What it does with the knowledge, I don't know.
>> But it certainly has more knowledge to work with than spinrite.
>>
>> fyi: hdparm has a long read capability that allows a full physical
>> sector to be read with no error correction!  So spinrite could in
>> theory read all of the sectors with CRC verification disabled and
>> check the CRC itself.  The trouble is the the drive manufactures
>> implement proprietary CRC / ECC solutions, so spinrite has no way to
>> actually delve into the details of the sectors data accuracy.
>>
>
> That last sentence is not correct.  It may be true that Spinrite cannot
> calculate it's own ECC.  However, if the sector doesn't read correctly,
> the ECC correction is turned off.  Then, Spinrite reads the sector
> repeatedly, starting from different head positions and flying to the
> target sector, and accumulates up to 2000 samples of erroneous sector
> data.  It uses excruciating statistical techniques to analyze the
> samples and determine the most likely value, 1 or 0, for each bit.  It
> then reconstructs the sector and saves what it recovered back to the
> drive after a surface analysis has verified that it's safe to do so.  In
> many cases, just the repeated reading from different positions will
> accomplish a perfect read.  If so that perfect data is saved until a
> surface analysis verifies the magnetic reliability of the sector.

I'd have to go look at the ATA spec.  If you do a "long read" (ie.
read the entire physical sector, not just the payload) does the drive
return a ECC good / bad flag.  I did not think it did, but I'm not
positive.

Without the flag, spinrite would have no way to know if those long
reads are good or bad.

As to normal sector reads, as far as I know, they always return valid
data, or they fail.  No in between.  That may vary by manufacturer.
Again the spec. would be useful, but lots of manufacturers don't
follow the specs word for word.

<snip>
>>
>> When that sector is written (by linux or spinrite) then the sector is
>> reallocated to a spare sector.  And the old sector is not used again.
>>
>> fyi: hdparm has a way to force a write to Pending Sector and put new
>> good data on it.  Thus spinrite could do this if it wanted to as well.
>>  I certainly hope it is not doing so.
>>
>
> I don't see why it would need to write to a sector that the drive wants
> to swap, but don't know for sure.  It does have to prevent swapping
> until it's finished data recovery.  But, I would assume the swap, if
> needed, is always allowed before writing the new final data back.

In general ATA drives (sata and pata) don't do a sector re-allocation
until they have known good data to put in the new sector.  Thus the
DRIVE will only re-allocate on write.

<snip>

>> >
>> > 3) The drive has obvious errors and warnings. - In this case it is
>> > likely that some data is unreadable by conventional means.  It is highly
>> > likely that Spinrite will recover the data and save it elsewhere on the
>> > drive, storing it in fresh strong magnetic domains.
>>
>> I believe a smart long self test will read all of the sectors and
>> identify those that are not ECC Recoverable.  I don't think it will
>> actually reallocate them.
>>
>
> I never could find out what that test does.
>

Much of smart is up to the manufacturer to define the details.  So you
will likely not find a definitive answer.  I go more based on the time
it takes to run.  It seems consistent with a full read-only surface
scan.

<snip>

>> > Again, this may or
>> > may not trigger sector reallocation.
>>
>> I surely hope writing to a sector previously had read failures not
>> handle-able via ECC recovery triggers a reallocate.
>>
>
> I would assume so too, but don't know if the drive has a certain
> threshold.  I don't know if the program can force a reallocate, but I do
> know it delays them while data recovery is going on.

"delay" is accomplished by not doing any normal disk writes.

Once you have recovered the data in a sector, you can do as many "long
reads" and "long writes" to a sector as you like.  They don't trigger
any automatic drive action.  (You can test this with hdparm.  It
supports both long reads and long writes.)

As to forcing a relocate, spinrite can via:

"long write" known bad data.  ie. force a bad ECC.
normal read the sector - drive returns a media error and flags sector
pending relocation
normal write the sector - drive relocates

Note that the key thing is the DRIVE is doing most of the work.  Tools
like spinrite just leverage the smarts the drive already has.

That is a big part of my issue with the way spinrite writes up its
marketing material.  It basically makes it sound like spinrite is
inside the drive twiddling bits.  By using long reads / writes, it can
indeed do a lot, but it can't do everything!

Statements like "spinrite delays relocates while data recovery is
going" on makes it sound like spinrite has too much control.

As an engineer, I'd rather read "spinrite does not do any normal
sector writes while data recovery or surface analysis is ongoing
because a normal sector write could trigger a relocate."

Greg