[ale] what exactly does a long smart hdd test do?

Justin Goldberg justgold79 at gmail.com
Mon May 14 22:47:19 EDT 2012


That's a good argument for distributed, intelligent, encrypted backup.
Think of a private Freenet.

We all have tons of drives and computers but why do we have to backup
everything? For example, we edit a document that we downloaded, we
keep a few revisions/backups. We create a document, score that file
higher to keep more backups of that file. Mp3 files from bittorrent,
don't back them up or score really low. Mp3 files we ripped from a cd,
back them up, scored higher than the downloaded ones, but lower than a
document or our bookmarks file. Or you could say that This would be
best created by a  filesystem like BFS, where custom database
attributes can be kept. I guess a fully journaled filesystem would
accomplish this as well, but I'm dreaming of something network-wide.

On 5/14/12, Ron Frazier (ALE) <atllinuxenthinfo at techstarship.com> wrote:
> Hi Mike T,
>
> Thanks for all the info you shared in this post. It's taught me several
> things I didn't know about the low level operation of a PC. I may have
> mis-spoken about the reliance of SpinRite on BIOS. I think I read somewhere
> on his website (later) that he bypasses BIOS, but I don't know how all the
> details are worked out. I try to do regular backups, but always seem to be
> behind on doing so. I did eventually manage to boot a Ubuntu live CD on my
> old pc and run a badblocks non destructive read write test. It finished the
> Linux partitions fine and is chugging away on one of two windows partitions.
> This test takes a LONG time, just as it likewise does with SpinRite. I found
> an interesting quirk you have to account for when specifying the block
> numbers for badblocks to use, which I commented on in another thread.
> Essentially, if you get the number of blocks from fdisk, you have to
> subtract 1 from the number to feed it to badblocks. That drove me crazy for
> half a day or so.
>
> You mention data scrubbing in your post. I've run across that in my
> research, as well as some very interesting info on latent defects. My
> research is ongoing, but my current conclusion is that data scrubbing is
> beneficial for personal drives too, hence my willingness to run 36 - 48
> hours of diagnostics on my drives, 2 - 3 times / year. Data scrubbing may be
> more valuable on personal drives than on raid, since there is no drive to
> auto failover to. If the sectors on my drive fail to read, unless SpinRite
> can recover them, which I've seen it do on occasion, then my only solution
> is to resort to backups, which are probably out of date.
>
> Another, unrelated, problem is that, about half the time, that old computer
> will not boot the GUI Unity interface on the live CD of Ubuntu 11.10. At
> other times, it works. Totally insane. I hate Unity anyway, but it would be
> nice to have SOME kind of GUI running. To run the recent badblocks test, I
> had to resort to hitting ALT-F1 to get to a terminal and use that. The GUI
> never did start up. I tried the startx command, but that didn't do anything
> either.
>
> The saga to preserve ever so fragile data continues ...
>
> Sincerely,
>
> Ron
>
>
> --
>
> Sent from my Android Acer A500 tablet with bluetooth keyboard and K-9 Mail.
> Please excuse my potential brevity.
>
> (To whom it may concern. My email address has changed. Replying to former
> messages prior to 03/31/12 with my personal address will go to the wrong
> address. Please send all personal correspondence to the new address.)
>
> (PS - If you email me and don't get a quick response, you might want to
> call on the phone. I get about 300 emails per day from alternate energy
> mailing lists and such. I don't always see new email messages very
> quickly.)
>
> Ron Frazier
> 770-205-9422 (O) Leave a message.
> linuxdude AT techstarship.com
>
>
> "mike at trausch.us" <mike at trausch.us> wrote:
>
> On 05/11/2012 09:09 PM, Ron Frazier (ALE) wrote:
>> Hi Mike,
>>
>> I don't think that discredits the project, I think it's a wise design.
>> Here's why. The most recent version of SpinRite came out in 2004 and has
>> a history going back to 1988. The program designer is planning an
>> update, but it's not out yet. Nevertheless, that version will work on
>> modern drives. I don't know how deeply SpinRite interacts with the bios.
>
> If it has the limitations of BIOS, then it is using the BIOS functions
> as published in Ralf Brown's Interrupt List (RBIL), which was (and for
> the real-mode programmer, still is) the definitive source of BIOS
> interrupt interfaces. The BIOS supports only a limited set of
> functionality, which has been extended over the years to cater to larger
> drives and so forth. It supports only limited error handling (for
> example, INT 0x13/00 is "RESET DISK SYSTEM", but that only seeks the
> drives to track 0, it doesn't actually perform a bus reset).
>
> The INT 13 interface is horrible, and unsuitable for the implementation
> of any serious software these days. And has been for many years, which
> is why modern operating systems no longer use that interface. Even
> Windows 3.11 had a dedicated driver for bypassing the BIOS routines for
> disk access (they called it "32-bit disk access" or something along
> those lines, IIRC).
>
>> I do know it boots from its own copy of freedos and runs from there. The
>> product was designed to work with any PC compatible computer, be it PC,
>> Linux box, or Mac. It can even work with Tivo or iPod drives, etc. if
>> you take the drive out and attach it to a PC. It needs to be able to
>
> Well, yes. They all speak the same language, regardless of the type of
> computer they are plugged into. The only thing that is really different
> between disks in a Mac and a PC is the convention used to store data
> (e.g., partition tables or maps and filesystems).
>
>> have total control over the drive, including disabling some of the
>> drive's normal error correction, so it can do analysis and detect
>> problems. It can't have the OS in the way and interfering with it's
>> operation. The primary target machine it was designed to run on was no
>
> ... but it's perfectly happy having the limited, inconsistently designed
> various BIOS programs get in the way and do its work for it?
>
> I'll grant that the Linux generic SCSI interface did not exist when
> SpinRite was first created. I'll grant even further that it was
> impractical in that time period to actually write dedicated drivers for
> the various disk controllers which existed at the time. The reason that
> BIOS existed was to simplify the creation of relatively simple systems
> so that they did not need to know anything more than the generic BIOS
> interface.
>
> However, with that generic BIOS interface comes a
> lowest-common-denominator approach to handling the disk controller and
> therefore the disks themselves. This would also be the primary reason
> why it's a horrible idea to use the BIOS interface.
>
>> doubt Windows machines. Back in 2004, those machines were running
>> various combinations of Win 95, Win98, Win ME, Win 2000, Win NT, and Win
>> XP. I don't think any of the Windows systems allow the kind of
>> unfettered access to the drive that SpinRite needs. Also, as far as I
>
> Windows systems not built on the Windows NT kernel (e.g., Win9x and
> earlier) do allow direct hardware access because at their core they were
> still 16-bit operating systems. Calls to the Win32 API were largely
> thunked to 16-bit modules that were preexisting. There were large
> components of the system that ran in 32-bit mode, but a lot of it did
> not. There was therefore a heavy cost associated with the constant
> changing of the CPU mode as part of context switches and so forth, which
> made Win9x both clunky and relatively unstable.
>
> Starting with Windows NT, direct hardware access is prohibited to most
> applications. The exceptions are those that create kernel drivers that
> enable an application to bypass such restrictions. I don't know if NT
> has something like the Linux generic SCSI interface, but if it doesn't
> and it were necessary for an application to be implemented, it would
> certainly be possible to do.
>
> Building a utility directly on top of a SCSI interface would be far
> superior to building it on BIOS. Building it to talk directly to
> hardware would be the only way to get closer to the metal than what SCSI
> interfaces would allow, but that's unnecessary for most applications,
> and if it were necessary for an application the preferred way to do it
> would ideally be a 32-bit extended DOS program that performs direct
> hardware access. Of course a real-mode program would also work, but
> there's little to no point to writing real mode code, since it's quite
> easy to get a 32-bit DOS compiler anyway (GCC, for example, supports
> MS-DOS via DJGPP).
>
>> know, there isn't a way to dismount an internal drive in Windows and
>> work on it, as you potentially can in Linux. Even if you could dismount
>> a drive, the system needs to run on the system drive, and the average
>> user doesn't have any way to boot a Windows machine without doing so
>> from the system drive. So, the best design choice was to make a product
>
> You can absolutely unmount drives in Windows. Use the Disk Manager in
> the Microsoft Management Console for a GUI way to do it. There are of
> course API calls that can be used by custom software to do it as well.
>
> You are right in that you cannot unmount the system drive, but that
> problem is common to all operating systems, not just Windows. DOS had a
> sort of exception, but DOS was also small enough to remain completely
> resident in RAM when there was more than 1 MB of memory present and usable.
>
>> that booted itself. That way, the OS isn't running, all the drives are
>> dismounted, and he didn't have to wonder whether the user would be able
>> to boot their pc so they could use his software. It was probably the
>> best solution to the problems he had to deal with. On my machines where
>> the bios is new enough to match the hard drive capacity, I can run
>
> Most, if not all, modern BIOS firmware provides what are known as the
> INT 13 extensions, which provide a portable interface to address up to
> one or many different sizes. There was a series of progressions in the
> maximum disk size that BIOS could support; 500 MB, 1 GB, 4 or 8 GB, 128
> GB and I don't remember the current one but it is pretty large relative
> to the time period it was introduced in. Maybe 500 or 800 GB or something.
>
> Anyway, given a disk that fits within the support of the BIOS, you can
> read and write sectors by identifying the C/H/S (very old API) or LBA
> address of the first sector and the count of sectors. The BIOS will of
> course return an error condition in the CPU's registers if it
> encountered an error while reading one of the sectors.
>
> Just as a side note, DOS called upon BIOS to do the work, but
> well-behaved programs that were written for DOS and did not require the
> ability to go beyond DOS' support of storage would simply call DOS
> interrupts to get the job done, which provided a slightly higher-level
> API since you did not need, for example, to worry about sectors but
> instead clusters in named files.
>
> SpinRite doesn't need to use any of the DOS filesystem facilities,
> though, and DOS is inherently a single-tasking operating system, so it
> is safe for SpinRite to assume that it can have exclusive control of the
> disk while it is running.
>
> Also note that there are DOS implementations that have Windows NT-like
> restrictions on direct hardware access for certain things. Such
> versions are usually ones that provide some sort of task switching or
> multitasking ability, for example taking advantage of the functionality
> and features of the 386 or newer CPUs and providing DOS applications
> with a V86 environment instead of a real one.
>
>> SpinRite on both the Windows partitions and the Linux partitions. It
>> doesn't care. It works strictly at the sector level and is non
>> destructive. Even to use the badblocks command as you and Jim have
>> suggested on my old PC, I have to shut it down and boot a foreign OS, ie
>> a live Linux CD, in order to run the test. That's exactly the same thing
>> SpinRite is doing. It just happens to be booting freedos rather than
>> Linux.
>
> No, you can do it while the system is running, you just cannot use the
> read-write test mode.
>
> The read-write test mode in badblocks is superior because it does not
> depend on the data currently stored in the sector. Certain data
> patterns can hide error conditions which may exist on the platter; for
> example, a particular bit may be stuck "on" or "1", but you'd never know
> that if the value that is there is legitimately 0xFF. But you'd detect
> it if you wrote 0x00 there, and when you read it back it was, for
> example, 0x01 or 0x80, because the stuck bit didn't get cleared.
>
> SpinRite does this at the disk sector level (presumably, since it is an
> ancient program, with a fixed value of 512 for the sector size). The
> badblocks command works on blocks, too, but you can specify the size of
> what it considers a block. A common value is 4,096 bytes for a block
> when running badblocks, though virtually any block size that is a
> multiple of 512 will work for older drives; make that a multiple of
> 4,096 for modern so-called "advanced format" drives.
>
>> I may run the non destructive rw test on the old pc using badblocks as
>> you and Jim suggested in other messages. It already passed the long
>> smart test and it says the drive is healthy with no bad sectors. I just
>> have to figure out how much additional time I want to spend on it.
>
> For a non-critical (e.g., personal) system, SMART should be sufficient.
> You of course take regular (monthly or weekly) backups of your ${HOME},
> right? If that's the case then recovery is possible within hours of new
> drive installation, and for an individual, particularly one who has
> multiple computers, that is an acceptable thing. If not, you can employ
> other steps to try to delay or defer the restoration process, such as
> using RAID, but backups are backups and still quite necessary.
>
> I have one array I manage that it would take approx. 30 hours to restore
> from backup. In an attempt to avoid that in all but the most
> devastating situations, I have it on a RAID array. As long as the RAID
> array's health is maintained, it is possible for me to keep taking and
> testing backups and knowing that I can, if need be, recreate the
> configuration as it exists in the office today along with all the data,
> but I don't have to if only one or two drives fail, because I can
> replace them almost immediately upon failure. I'm a bit on the paranoid
> side, too, I will start replacing drives as soon as they stop behaving
> 100% perfectly as opposed to wait for a hard failure.
>
> So far that seems to have been a good way to ensure that things stay
> running... the drives I've removed were all used again in my home
> (after, of course, being wiped) and failed within a month of removal
> from the array. One of them failed only two days after removal. Even
> more than smart, the Linux kernel ring buffer is a great source to
> monitor for disk trouble, especially on a system that has many disks
> with lots of activity on them. The kernel will notice errors as soon as
> they are encountered.
>
> Additionally, RAID devices get "scrubbed" once per month anyway (at
> least by default on Debian and Ubuntu). The "scrub" process is
> *exactly* what SpinRite does, reading everything on all the disks. It
> doesn't re-write every sector, but it doesn't need to: if a sector
> cannot be read it is reassembled from parity information and an attempt
> is made to re-write it. If the drive was able to re-map the sector, the
> write will work and things continue. If the drive was unable to re-map
> the sector (say, because it ran out of sectors in the spare sector area)
> then the write will most likely fail and the disk will be marked
> "failed" by the virtual RAID controller.
>
> That's robust enough for me, at least with the requirements I have in
> the environments that I am managing for the moment. I would like,
> however, to have a beefier system for the RAID. Not because of the CPU,
> but because it would really do well to have a veritable buttload of RAM.
> A lot of the operations would be faster if the system had 4 GB of RAM
> to use for caching and buffers...
>
> 	--- Mike
>
> --
> A man who reasons deliberately, manages it better after studying Logic
> than he could before, if he is sincere about it and has common sense.
> --- Carveth Read, “Logic”
>
> _____________________________________________
>
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>


-- 

Justin Goldberg

*justgold79 at gmail.com*
(504) 208-1158
http://gplus.to/goldberg
http://twitter.com/justingoldberg



More information about the Ale mailing list