[ale] Which large capacity drives are you having the best luck with?

Ron Frazier atllinuxenthinfo at c3energy.com
Thu Jan 6 00:31:34 EST 2011


SUMMARY OF SPINRITE TECHNOLOGY - Based on available documentation on the
company website.

There have been several interesting comments by others relating to my
discussions of Spinrite and hard drive maintenance.  I will admit that
I'm way over my head, but I will attempt to address some of those
comments as best I can in another reply to those posts.

As a prelude to that, I've referenced some documentation on the GRC
website and will attempt to summarize the data recovery and surface
analysis technology that Spinrite uses.  The documentation is a bit
dated, ie 1995.  However, I believe that, even then, Steve Gibson
probably had the most sophisticated hard drive analysis and recovery
technology in the world.  I suspect that, even if it's not the best in
the world any more, it's still current with state of the art.  I would
invite those interested to review the documentation on the website.

This video:

http://www.grc.com/sr/themovie.htm

has some good technical data in it, but not a lot.  Here are the main
things I got out of it.

The system first tries to read a sector during analysis.  If the sector
cannot be read completely, the program flies into the sector from
various locations, which leads to different head alignment, velocity,
and inertia; and makes numerous reads and collects statistical data.  It
prevents sector swapout until all possible data is read from the sector.
The collection of different reads is statistically analyzed and the most
likely collection of bits for that sector (usually 4096 bits or 512
bytes) is compiled.  The drive is allowed to swap out the sector for a
spare if desired.  The data which has been recovered is then rewritten
to the original sector if a swap was not performed, or to the new spare
sector if a swap was performed.  Therefore, the system can recover most
of the 4096 bits even if some were corrupt.  This occurs at the bit
level, and has nothing to do with partitions or file systems.

This is the main Spinrite page on their site:

http://www.grc.com/sr/spinrite.htm

It links to the following Linux Journal article among other reviews and
testimonials.

http://www.linuxjournal.com/article/7684

This is another what you might call a main page:

http://www.grc.com/spinrite.htm

It links to numerous others including: SpinRite Overview Screen Shots
Documentation Defect Detection Data Recovery Reviews Exclusive Features
Feature Summary SpinRite Q&A Version History

>From the documentation page:

http://www.grc.com/srdocs.htm

I found this technical report on the technology:

http://www.grc.com/files/technote.pdf

Here are some quotes from that document to explain Spinrite's operation.
These quotes are not necessarily contiguous.  This is long, but, believe
it or not, I did not copy the whole document.

(Quoting.)

Even when a sector won't ever read correctly, there's still hope. The
data being read from a
marginally readable sector changes from one reading to the next, and
useful, if not correct,
information is contained within each of these differing readings. We
have found that a
careful statistical analysis of the results of multiple incorrect
readings can be used to
pinpoint a sector's trouble, and to reconstruct the original information
the drive has been
“trying” to read. This is the key behind SpinRite's DynaStat data
recovery system.

DynaStat's
recovery methodology incorporates several complementary strategies: The
first is simply
extensive retries. As we've seen, just trying harder often results in
just one good read . . .
which is all we need. The recovered data won't then be returned to the
same sector, after
we've retrieved it, unless we verify that it's truly a safe place to
restore the data.

During this exhaustive rereading, DynaStat employs its second recovery
strategy of
deliberately wiggling the drive's heads. By successively approaching the
troubled sector
from different distances and directions, the heads arrive at the
sector's track at different
velocities, which in turn produce small but significant displacements in
the head's resting
position. This allows DynaStat to compensate for the long-term alignment
drift that occurs
in non-servo based drives, and the positioner hysterysis that occurs in
servo-based designs.

DynaStat's exhaustive, head-wiggling re-reading is almost always able to
coerce one good
or correctable read from a recalcitrant sector. But when the sector just
will not read,
DynaStat's third, core, recovery strategy is brought into play: The mass
of data collected during its many re-reading attempts is statistically
analyzed in an attempt to calculate the
sector's original contents. At the very least, the amount of data lost
is significantly
minimized by this process, and more often than not the sector's data is
correctly calculated
and completely restored.

(RON talking again.)

Here is a verbal description of the data recovery process based on a
flowchart in the document, assuming a sector cannot be read correctly.

0) Prevent the drive from swapping in a spare sector and thus losing the
data.
1) Read a random sector to move the heads away from the recovery target.
2) Read the target sector and accept the data even if it cannot be ECC
corrected.
3) If the read was perfect, save the data, then go to surface analysis.
4) If the read was not perfect, add the data to a growing pool of
erroneous sector data.
5) Statistically analyze existing pool of erroneous sector reads to
determine if more reads would be beneficial.
6) Continue this read and analyze process until
   a) a perfect read is achieved
   b) the maximum retry limit is reached
   c) it is determined that further reads are not statistically
beneficial
7) Save the perfect data, if it was ever read, or the best data set
achieved through statistical analysis of the bits, in memory.
8) Allow the drive to swap out the sector for a spare if desired.
9) Thoroughly analyze this sector's magnetic surface, whether new or
original, to determine its safety for storing data.
10) Write the perfect data, if available, or the statistically recovered
data back to the sector.
11) Move to the next sector.

(Now quoting again.)

Contrary to casual belief, recovering only most of the data from a
sector can be a
tremendous benefit for data recovery. SpinRite is able to at least
recover most of a sector's
data even in the worst situations. For example, if that sector were a
chunk of a partition's
file allocation table, a few lost bytes would probably damage the
structure of just one file,
but losing the entire sector would confuse 256 clusters and all of the
files containing them.
If a sector of the root directory or any sub-directory were completely
lost, all of the
directory's files and sub-directories would be lost, but if the loss
were contained within just
a few bytes, one directory entry would be hurt, but everything else in
the directory and its
sub-directories would be saved.

recovering only a portion of an error in a large database often allows
the
balance of the database's data to be recovered rather than rendering the
entire file useless

it's even possible for executable files to be used with care.
SpinRite's users have reported that most functions of non-compressed
executables can still
be used after partial SpinRite recovery

there are times
when an executable file is completely irreplaceable and accepting some
alteration is
preferable to losing the file's entire functionality

After coercing all possible data from a drive, SpinRite then determines
whether the drive's
storage surfaces underneath the recovered data are capable of safely
storing and retrieving
whatever data the system might choose to place there.

Since magnetic mass storage devices are not completely defect free, the
best aid for the
long-term maintenance of reliable data storage is the early detection
and elimination of
inevitable surface defects. These defects, which are caused by surface
scratches, abrasions,
pits, or thin magnetic material plating, reduce the strength of the
recorded signal when it is
being read back. Defects have also been shown to develop or “grow” due
to a gradual
evolution of the drive's storage surfaces. To achieve the highest
possible storage reliability,
any locations that can be shown to affect the integrity of recorded data
should be
immediately removed from the operating system's use.

The strategy used by SpinRite 3.1 to detect these regions is currently
unique in the
industry: A special data sequence is custom-designed and recorded onto
the drive, then
carefully read back with the drive's internal “error correction”
protocols momentarily held
in check. The specially crafted data sequence plays a fundamental role
in the detection of
weak spots by sliding a signal that alternates between maximum and
minimum amplitude
along the drive's entire surface.

The maximum-amplitude portion of the signal tricks the drive into
lowering the “gain” of
its read amplifier. Since any signal “clipping” that would result from
the amplifier's gain
being turned up too high must be avoided at all cost, the drive's AGC
(automatic gain
control) circuitry quickly responds to any large signal amplitude by
lowering the
amplifier's gain. This large amplitude signal is immediately followed by
a small pulse of
the lowest possible strength. Since the amplifier's gain has been
cranked down by its
encounter with the largest possible pulse, the small signal pulse is
made even smaller.
If there's anything at all weak or uncertain about the location
underneath the tiny pulse,
a deliberately detectable read error will result and SpinRite will have
found a new defect in
the surface!

the key to accurate defect detection lies in somehow managing to
generate a series of
flux reversals of alternating strength. The maximum strength flux
reversals trick the drive's
internal automatic gain control into expecting a large signal, and the
small flux reversals
provides a means for detecting any diminished capacity on the storage
surface underneath.
So the thousand dollar question is: How can a software-only product like
SpinRite possibly
control the recorded strength of a drive's flux reversals?

Rules for Combining Flux-Reversals
1. A single, isolated, flux reversal generates the greatest possible
strength signal, ... and ...
2. A “triplet” of three flux-reversals occurring as close to each other
as possible, generate a minimum-strength flux reversal in the center.

In
order for SpinRite to specify the flux reversal sequences it desires,
rather than merely the
data it wishes to record, it must understand the relationship between
the data and the flux
reversals for the drive being tested. This understanding is used to
“reverse engineer” the
data from the flux reversal sequences.

As it turns out, FM and MFM encoding are just two members of a
mathematically infinite
family of possible data-to-flux reversal encoding schemes. The next step
takes us into the
domain of RLL encoding where the traditional fixed FM and MFM
encoding/decoding
rules no longer apply. IBM has their own (patented) RLL encoding scheme
which they
license to those willing to pay. Conner Peripherals uses their own
design, as does Maxtor, Quantum and Seagate. 

Since the drive's encoder/decoder determines the relationship between
the user's data and
the drive's magnetic flux reversals, we must know WHICH ENCODER/DECODER
is
being used by the drive in order to determine what data to send to the
drive for surface
analysis.

SpinRite 3.1 contains mathematical description simulation models for
every flux encoder
being employed in hard disk drives today (and even for some which are
still in the labs
getting ready for tomorrow!). After identifying the drive's manufacturer
and model
number, SpinRite utilizes the corresponding encoder's mathematical model
to derive a
family of test data which is specifically tailored to produce these
optimal flux reversal sequences for testing the surface of the drive.
Each successive sequence of test data results
in single-bit shifted flux reversal phasing which thoroughly scrubs the
entire surface of the
drive.

Some moderately sophisticated artificial-intelligence technology was
used to recursively
goal-seek the optimum sequence of flux reversals, within the constraints
and limitations of
the drive's data-to-flux encoder, then work back up through the drive's
decoder model to
deduce the original input data which will produce these optimal surface
analysis flux
reversal patterns.
The resulting customized sequences of data patterns are specifically
designed for each
drive and technology. SpinRite 3.1 thereby performs a far better job of
surface analysis in a
shorter time than has ever been possible before. All this technology
works, and has been
incorporated into – and buried inside of – SpinRite 3.1.

(RON talking again.)

As you can see, Steve Gibson, Spinrite's inventor, was using extremely
sophisticated methods of data recovery and surface analysis even in
1995.  His methods may have changed since then, and I don't have access
to documentation on the new stuff.  The help screen of the program
itself is where it talks about the read invert write read invert write
cycle that I've mentioned before.  I must assume that this exhaustive
surface analysis technology is still there.  I know the Dynastat data
recovery is still there, and probably much improved.  It seems to me
that maintaining a mathematical database model of all data / RLL flux
encoder / decoders out there would be quite a chore.  Also, the product
hasn't been updated in a few years.  He may have figured out a new
simpler way to do surface analysis and still maintain a high degree of
reliability.

Hope this clears up the fog a bit.  I will endeavor to respond to some
of the posts as best I can.

Sincerely,

Ron


-- 

(PS - If you email me and don't get a quick response, you might want to
call on the phone.  I get about 300 emails per day from alternate energy
mailing lists and such.  I don't always see new messages very quickly.)

Ron Frazier

770-205-9422 (O)   Leave a message.
linuxdude AT c3energy.com




More information about the Ale mailing list