[ale] what exactly does a long smart hdd test do?

Ron Frazier (ALE) atllinuxenthinfo at techstarship.com
Sun May 13 22:55:00 EDT 2012


Hi Mike W.,

I really do want to research and reply to your other posts and Mike T's in more detail. This actually takes a good bit of time. For the moment though ...

I think the point IS that the drives become unstable, or that people think they do, or that they used to.

Let's forget SpinRite altogether for a moment. Let's talk about the non destructive read write test of badblocks, which is invoked with the -n option. My understanding of what it does is the following:

1) Read the original data from the sector and store it.
2) Blast the sector by writing with a whole bunch of random test patterns.
3) Read each test pattern back to determine if it read ok.
4) Blast it some more.
5) Once it's satisfied with the integrity of the sector, write the original data back and presumably verify that it reads properly.

This is pretty much exactly what SpinRite does as far as I know, except that it also has the option to invert all the original data during testing and write that and invert it again and write it again, so each bit is tested both reading and writing as both a 0 and a 1. SpinRite also has the option to return previously faulty sectors to duty if they pass testing, but I don't use that option.

You made mention below about SpinRite doing what it claims to do. All it's claiming to do, in terms of preventive maintenance, is the procedure I described above. I've never heard anything about it trying to rewrite servo tracks. You also mention that nothing outside the factory can rewrite servo tracks. My research indicates that there are Self Servo Track Writing (SSTW) procedures that are sometimes used, where the drive writes it's own servo tracks after being given a "seed" track at the factory to start with. Now it may be the case that this procedure cannot be initiated, or reinitiated, in the field; but, such drives could certainly write their own servo tracks. I think it's safe to assume that, whatever SpinRite does, and probably whatever badblocks does, it's going to have to do it on sectors that it can find, and therefore, the servo tracks must be at least good enough for that to happen. However, servo tracks are a small percentage of the drive surface. It is entirely feasible that there could be damaged sectors or sectors with latent defects (see below) that need help with the servo tracks still being good enough to find the sectors. Also, I know that if SpinRite cannot read a sector, it will fly the head into position from many different starting positions and at different velocities, in order to try to get the sector to read. So, it may even be able to deal with some sector not found problems.

So, what you seem to be telling me is that there is no need in the modern world to do this read write verification with either SpinRite or badblocks, that any data originally written to a sector should be there for the life of the other drive components, that it's a waste of time, and that a simple read only analysis of the sectors, with the controller presumably relocating any questionable sectors, should be plenty. Is that really what you're saying, because SpinRite and badblocks were invented specifically to do this type of read write testing.

Later we'll talk about the possible scenario where a sector cannot be read, and the controller gives up and relocates it, and you lose all the data in that sector. But, we'll talk about that under the heading of data recovery, and this message is about data fading and latent defects.

In my research, I found this article, which seems to corroborate your assertion that bit rot, per se, is not an issue. However, it also talks about numerous other latent defects, and recommends data scrubbing, which would seem to indicate that the type of read, verify, write maintenance I've been discussing is a valuable process.

http://www.google.com/url?q=http://entertainmentstorage.org/articles/Hard%2520Disk%2520Drives_%2520The%2520Good,%2520The%2520Bad%2520and%2520The%2520Ugly.pdf&sa=U&ei=2lqwT5HvGYqg9QStg4nyCA&ved=0CG8QFjAh&usg=AFQjCNHQSVG3yjYGvKMbVzZG6Ywbm4-qcw

"Latent defects (data corruptions) can occur during almost any HDD activity: reading, writing, or simply spinning. If not corrected, these latent defects will result in lost data when an operational failure occurs. They can be eliminated, however, by background scrubbing, which is essentially preventive maintenance on data errors. During scrubbing, which occurs during times of idleness or low I/O activity, data is read and compared with the parity. If they are consistent, no action is taken. If they are inconsistent, the corrupted data is recovered and rewritten to the HDD. If the media is defective, the recovered data is written to new physical sectors on the HDD and the bad blocks are mapped out."

Data scrubbing is essentially what SpinRite and badblocks are doing as far as I can tell.

I also found these other articles which are WAY above my head, but haven't drawn any useful conclusions from them. I thought some of you might like to see them though.

http://www.google.com/url?q=http://www.fujitsu.com/downloads/MAG/vol42-1/paper11.pdf&sa=U&ei=2lqwT5HvGYqg9QStg4nyCA&ved=0CBEQFjAA&usg=AFQjCNFowI-tLlS781ypw9-3ewC2aClbNw

http://www.google.com/url?q=http://www.me.berkeley.edu/~horowitz/Publications_files/All_papers_numbered/186c_Nie_ACC10_TutorialonSSTW.pdf&sa=U&ei=2lqwT5HvGYqg9QStg4nyCA&ved=0CBQQFjAB&usg=AFQjCNHK66bFS5X-AmYqIWl_Ggx1QehTvw

http://www.google.com/url?q=http://maeresearch.ucsd.edu/callafon/publications/2009/IEEETonM2.pdf&sa=U&ei=2lqwT5HvGYqg9QStg4nyCA&ved=0CBsQFjAE&usg=AFQjCNHgjBd8eCH285WSb7D9MM-hVjt8-w

Sincerely,

Ron


--

Sent from my Android Acer A500 tablet with bluetooth keyboard and K-9 Mail.
Please excuse my potential brevity.

(To whom it may concern. My email address has changed. Replying to former
messages prior to 03/31/12 with my personal address will go to the wrong
address. Please send all personal correspondence to the new address.)

(PS - If you email me and don't get a quick response, you might want to
call on the phone. I get about 300 emails per day from alternate energy
mailing lists and such. I don't always see new email messages very quickly.)

Ron Frazier
770-205-9422 (O) Leave a message.
linuxdude AT techstarship.com


"Michael H. Warfield" <mhw at WittsEnd.com> wrote:

I suddenly remember one very important critical point...

I'll edit heavily...

On Sun, 2012-05-13 at 14:33 -0400, Michael H. Warfield wrote:

> > Yon mention 3 things which SpinRite mainly does which I would agree
> > to.

> > 1) Deal with data fading. This could also be called grown defects or
> > bit rot.

> The term "bit rot" is a nice euphemism which covers a number of things
> but is not a precise technical term. But I know what you're saying.

> > I think we can certainly say that no magnetic surface is perfect, and
> > that some sectors or parts of sectors will be magnetically weak.

> No magnetic surface is perfect, agreed. Modern controllers and
> manufacturing processes are designed to with with that as an assumption.

> I'm not quite sure I can buy into the "magnetically weak" aspect though.
> Such areas should be detectable by the analog sections of the controller
> electronics early in manufacturing and flawed out. Once in operation,
> the the field strengths used by the heads, most especially the heads
> using in modern "vertical recording" where the magnetic poles of a bit
> domain are vertical and extend deeper into the recording media, rather
> than longitudinally along the surface of the track like older drives,
> should ensure a good strong penetration and the high mu media should
> hold that data for a very long time with no measurable fade. There are
> things that a controller can detect, because it's processing the fields
> at an analog level, that can give indications of impending soft failures
> to which SpinRite would have no access at all. These SHOULD show up in
> the SMART data, which is maintained by the firmware on the drive itself.

You are also missing one very VERY critical implication which I failed
to make apparent and I which totally forgot about... SpinRite can not
truly do what it claims to do if you understand what I described about
the nature of modern drives and the quadrature servo patterns. If this
bit-fade or bit-rot truly were a problem, the servo signals in between
the sectors would be just as susceptible and prone to failure. SpinRite
can not rewrite those patterns. In fact, nothing, outside of the
factory, can rewrite those half track quadrature patterns. They are
recorded once at the time the drive is manufactured and can never be
overwritten or rerecorded again. So, if the media were to be subject to
bit-rot there, SpinRite could do nothing about it. If this were a
significant problem, drives would eventually become unstable. The
probability is related to to the ratio of the relatively short servo
sync patterns to the length of a sector. I forget the exact run length
of each but it looks (ascii graphically) something like this...

qqqq ssss ddddddddddddddddddddddddddddeeee ssss qqqq ssss

Where:

q = quadrature servo control patterns.
s = data sync patterns
d = sector data
e = ecc error code recovery data

The spaces would be "read / write" splice gaps.

Generally anything taking out those quadrature patterns or enough of the
leading data sync patterns is fatal. I'm not totally sure all of the
data sync patterns are rewritten each time a sector is rewritten. They
may be. The splice gaps are there to allow for the heads to switch from
read to record with some allowance for timing and synchronization.

In the case of media flaws, the drive is capable of dead reconning over
several sectors if it missing the servo signals. If it's more than
that, the entire track can be unusable. So it's real critical that the
media be stable and not subject to magnetic fade.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw at WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20120513/3b85bdf8/attachment-0001.html 


More information about the Ale mailing list