[ale] what exactly does a long smart hdd test do?

Michael H. Warfield mhw at WittsEnd.com
Sun May 13 14:33:45 EDT 2012


Hey,

On Sat, 2012-05-12 at 21:16 -0400, Ron Frazier (ALE) wrote:

> "Michael H. Warfield" <mhw at wittsend.com> wrote:

> >Oh, LORD, I told myself I was not going to get sucked into another
> >Steve
> >Gibson cesspool religious discussion, yet here I am...
> >

> >
> >My personal opinion from both the disk storage angle and the security
> >angle is that Steve Gibson is now little more than snake-oil surviving
> >on past glories.
> 
> Hi Mike W.,
> 
> I appreciate the technical information you've shared.  I truly don't
> mean any offence  by the following statement and hope you don't take
> it that way, but, based on the quotes above, I have to ask myself if
> you're being totally objective, as those quotes sound like a definite
> negative bias.

Trust me, no offense taken and you are quite right.  I use to be a fan
of Steve Gibson years ago when he first came out with SpinRite and I
have bought full copies of SpinRite at times in the past.  But I have
become rather disillusioned of him of late largely through some of his
ill informed pronouncements on security and the way he has handled and
reacted to criticism from the larger professional security community.
He's been, errr, rather less than graceful...  That has left a bad taste
the the mouths of many of us and, yes, I don't pretend to be objective.
IAC, I don't expect people to take my word at things, which is generally
why I provide long winded technical back so that people can research my
reasoning and come to their own conclusions and agree with me or
disagree with me.  I take no offense in that at all.

> In particular, snake-oil implies to me that Mr. Gibson is engaged in
> deception, fraud, or deceit; and, I really don't think that is the
> case.  I trust that he's looking out for the best interests of his
> listeners.  I have no qualms about the $ 89 I paid him once upon a
> time years ago for SpinRite.  I think it's been worth every penny.  I
> will admit that there are now cheaper alternatives that do most of
> what SpinRite does.

Well, I won't quite go so far as to say that.  Many "snake oil" salesmen
truly believe in what they are selling.  Radithor and Dr. William Bailey
is a classic example of that.  It was a radium elixor (snake oil "patent
medicine") that Bailey truly believed in and took himself.  Some people
died from consuming his elixor but he did not.  His patient medicine was
snake oil but he truly believed in it himself.  When he later died of
other causes it was discovered that he was so radioactive that, I
believe, he ended up buried in a lead lined casket.

I will give Gibson credit.  I really do believe that he believes in what
he's saying or pushing.  I don't believe he's engaging in any
"deception, fraud, or deceit".  Judging from the laughter and comments
in the security forums I haunt, I don't think too many of my fellow
professionals hold his pronouncements in the same regard as he does,
though, and he has few defenders.  I do believe he engages in a little
excessive self promotion (but the same could probably be said about me)
and I think he over steps the bounds of his knowledge at times and holds
himself out to be more of an expert than he really is.  It's just that,
over the years, it has become progressively harder for me to take him
seriously.

> Yon mention 3 things which SpinRite mainly does which I would agree
> to.

> 1) Deal with data fading.  This could also be called grown defects or
> bit rot.

The term "bit rot" is a nice euphemism which covers a number of things
but is not a precise technical term.  But I know what you're saying.

> I think we can certainly say that no magnetic surface is perfect, and
> that some sectors or parts of sectors will be magnetically weak.

No magnetic surface is perfect, agreed.  Modern controllers and
manufacturing processes are designed to with with that as an assumption.

I'm not quite sure I can buy into the "magnetically weak" aspect though.
Such areas should be detectable by the analog sections of the controller
electronics early in manufacturing and flawed out.  Once in operation,
the the field strengths used by the heads, most especially the heads
using in modern "vertical recording" where the magnetic poles of a bit
domain are vertical and extend deeper into the recording media, rather
than longitudinally along the surface of the track like older drives,
should ensure a good strong penetration and the high mu media should
hold that data for a very long time with no measurable fade.  There are
things that a controller can detect, because it's processing the fields
at an analog level, that can give indications of impending soft failures
to which SpinRite would have no access at all.  These SHOULD show up in
the SMART data, which is maintained by the firmware on the drive itself.

It is always possible to have a bad batch of media where problems arise
like that.  The classic case there where the metallic thin film platters
that were in some Maxtor drives back in the day.  They pioneered high
capacity drives by doing away with the old ferric oxide coated platters
and innovated platters coated with thin film deposited metallic
coatings.  Unfortunately, at one time, they had a bad batch of media
which wasn't properly formulated and ended up being prone to oxidation.
The metal film very very slowly oxidized and lost its ability to hold a
magnetic field.  Problems started showing up a year or so later.
SpinRite probably could have extended the lives of those drives for a
while, but the problem with "bit rot" in this case was a progressive
deterioration.  The drive was doomed sooner or later and constantly
rerecording over the weak spot was only going to postpone the inevitable
irrecoverable failure.  Most modern media is aged and treated so this
shouldn't be a common problem...  I haven't seen it in a couple of
decades.

We do certainly see drive failures in modern drives but the majority of
what I see are generally classed as "electronic" failures or
"mechanical" failures, and not media deterioration.  Electronic failures
are failures of either the external electronic controller board itself
or some of the internal HDA (Head Disk Assembly) electronics (the head
amplifiers are generally in the HDA on the head stack and the head
actuator and spindle motors are in the HDA.  Failures in the electronics
can be hard catastrophic failures or intermittent failures such as
thermal failures.  I have had some success swapping electronics boards
between matching drives and recovered data, but not always.  Mechanical
failures are just something broken in the HDA and you're generally
toast.  But I have seen some media failures occur.  Rare, but they do
occur, I will admit.

I recently had a case with a consulting customer who was having a
problem with a Windows XP system.  It was very puzzling.  The machine
would work most of the time and then, give seemingly random errors.
When I booted it off my Fedora run-live USB stick, Fedora immediate
popped warning about excessive errors being reported from SMART.  That
drive had been failing and the controller had been managing it, most of
the time.  Would SpinRite have helped here?  Maybe, maybe not.  Maybe it
would have extended the life of that drive for another few months or a
year or so.  My reaction was "I can't trust this drive."  So, he went
out and bought another whole machine (which was cheaper than what it
would have cost him to have me replace the drive and rebuilt the
machine).  SMART starts indicating a drive is going south, it's time to
start looking for a replacement.  SpinRite MIGHT extend the life of the
drive a little bit and buy you more time to get that replacement but I
would be leery of depending on it.  Seemed like that could have been
media failure but it could also have been a progressive thermal failure
that had finally become critical.

>   Hopefully, the controller will catch those and avoid using them.
> The real question is, will the weak ones that the controller didn't
> catch get weaker over time, and will data that was once stored become
> inaccessible after being ignored and not accessed for a long time.

This is highly unlikely.  The controller has access to information that
SpinRite can not.  It can be just as easily argued that the repeating
exercising of the drive could lead to premature failure, though I doubt
you're running it often enough for it to be a significant issues.

>   You say it's not relevant to modern drives.  I'm not totally
> convinced, but I need to do some homework to discuss it much further.

> 2) Data recovery of bad sectors / blocks.  This can obviously be
> relevant, as drives do fail for any number of non mechanical reasons.
> By definition, we're talking about sectors that cannot be read by the
> normal procedures that the OS uses.  So, right off the bat, unless you
> do something radical, you're going to lose all 1024, 2048, 4096 or
> whatever bytes are in the sector.  You say other things do this
> better.  I'm definitely not convinced of that.  However, I'm going to
> have to study dd-rescue a bit to discuss that.

These tools can do no worse than SpinRite.  Seriously, all it's doing,
fundamentally, is reading, testing, and rewriting sectors.  It can pick
up on soft errors, retries, and ECC recovery indications from the drive.
The firmware on the drive controller board (the board on the drive
including both the digital and analog electronics) can do far better.
That's what SMART is for.

I had SpinRite absolutely fall on it's face when I tried recovering a
failed 200G drive for a photographer friend a few years back.  Its
struggles to recover and rewrite a bad spot on track 0 of that drive
resulted in the loss of roughly an entire cylinder.  The efforts it made
to "repair" that spot on track 0 seemed to have caused the physical
damage to be extended further along the track.  Given that a logical
cylinders don't really map 1:1 to physical cylinders any more, I'm sure
it was limited to the one physical head and SpinRite just managed to
finish the job taking out that entire physical track for me.  The drive
had failed, no question about it.  There were other bad spots elsewhere.
SpinRite would not have prevented this crash (I believe it had suffered
a "soft touch" head crash in a critical spot) and by using it before
trying to use dd-rescue, I made a bad situation worse.  Using dd-rescue
afterwards and a number of rather inventive techniques like varying
temperatures (I actually stuck the drive in a freezer several times) and
voltages, I managed to recover all but about 500KB of that drive and
every one of is critical photos.  I don't think SpinRite did any other
damage other than cyl 0 but I had been able to read most of cyl 0 before
trying to see if it could recover track 0 and after it tried for several
minutes and then aborted, I couldn't touch anything on that cylinder
again.  Fortunately, none of the critical data was on that cylinder.
Lesson learned.

> 3) Drive head / servo calibration.  I really wasn't aware of this one.
> But, based on what you said, I'll concede that this is probably not
> relevant to modern drives.

> I want to study the technical data you shared and hopefully generate
> an intelligent reply.  I will admit to being a fan of Mr. Gibson, but
> I don't think I'm a "fanboy".  I have benefitted greatly both from his
> security podcast and from the SpinRite product.  My pc is both more
> secure and more reliable based on my listening to the information he
> shares.  Having said that, I certainly don't want to be doing 36 hours
> of exhaustive diagnostics on my hard drives if it's not helpful.  I'm
> not totally convinced that that is the case however.

Anything he says regarding security should be taken as a guideline and
confirmed through other sources, not provided by him as references.  He
has given out some bad advice in the past.  He may make you think about
things which you should think about but verify what he says.  Don't just
take him (and don't take me for that matter) at his word as an
authoritative source to be trusted.  I expect people to verify what I
say and not take my word for it.

> I'll also ask myself if I'm being objective when I reply, and keep
> religion to a minimum.  My only goal is to facilitate maximum
> reliability of my machines, and those of other people whom I'm in
> contact with.

It's a good objective for everyone.  Absolutely.  That's also why I
prefixed my previous message the way I did.  I knew what I was stepping
into.

> I'll write some more later after I've studied your information a bit
> more.

> Sincerely,

> Ron
> 
> 
> --
> 
> Sent from my Android Acer A500 tablet with bluetooth keyboard and K-9 Mail.
> Please excuse my potential brevity.
> 
> (To whom it may concern.  My email address has changed.  Replying to former
> messages prior to 03/31/12 with my personal address will go to the wrong
> address.  Please send all personal correspondence to the new address.)
> 
> (PS - If you email me and don't get a quick response, you might want to
> call on the phone.  I get about 300 emails per day from alternate energy
> mailing lists and such.  I don't always see new email messages very quickly.)
> 
> Ron Frazier
> 770-205-9422 (O)   Leave a message.
> linuxdude AT techstarship.com

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
Url : http://mail.ale.org/pipermail/ale/attachments/20120513/0ef1d930/attachment.bin 


More information about the Ale mailing list