[ale] what exactly does a long smart hdd test do?

Wolf Halton wolf.halton at gmail.com
Tue May 15 08:40:15 EDT 2012


Justin,
Taking this idea to production on network servers would be interesting. I
am working on a networked backup script and I like the thought of
automating a scoring system for backup priority.

Wolf

http://evergreen-community-01.lyrasistechnology.org
http://sourcefreedom.com
Apache developer:
wolfhalton at apache.org
On May 14, 2012 10:51 PM, "Justin Goldberg" <justgold79 at gmail.com> wrote:

> That's a good argument for distributed, intelligent, encrypted backup.
> Think of a private Freenet.
>
> We all have tons of drives and computers but why do we have to backup
> everything? For example, we edit a document that we downloaded, we
> keep a few revisions/backups. We create a document, score that file
> higher to keep more backups of that file. Mp3 files from bittorrent,
> don't back them up or score really low. Mp3 files we ripped from a cd,
> back them up, scored higher than the downloaded ones, but lower than a
> document or our bookmarks file. Or you could say that This would be
> best created by a  filesystem like BFS, where custom database
> attributes can be kept. I guess a fully journaled filesystem would
> accomplish this as well, but I'm dreaming of something network-wide.
>
> On 5/14/12, Ron Frazier (ALE) <atllinuxenthinfo at techstarship.com> wrote:
> > Hi Mike T,
> >
> > Thanks for all the info you shared in this post. It's taught me several
> > things I didn't know about the low level operation of a PC. I may have
> > mis-spoken about the reliance of SpinRite on BIOS. I think I read
> somewhere
> > on his website (later) that he bypasses BIOS, but I don't know how all
> the
> > details are worked out. I try to do regular backups, but always seem to
> be
> > behind on doing so. I did eventually manage to boot a Ubuntu live CD on
> my
> > old pc and run a badblocks non destructive read write test. It finished
> the
> > Linux partitions fine and is chugging away on one of two windows
> partitions.
> > This test takes a LONG time, just as it likewise does with SpinRite. I
> found
> > an interesting quirk you have to account for when specifying the block
> > numbers for badblocks to use, which I commented on in another thread.
> > Essentially, if you get the number of blocks from fdisk, you have to
> > subtract 1 from the number to feed it to badblocks. That drove me crazy
> for
> > half a day or so.
> >
> > You mention data scrubbing in your post. I've run across that in my
> > research, as well as some very interesting info on latent defects. My
> > research is ongoing, but my current conclusion is that data scrubbing is
> > beneficial for personal drives too, hence my willingness to run 36 - 48
> > hours of diagnostics on my drives, 2 - 3 times / year. Data scrubbing
> may be
> > more valuable on personal drives than on raid, since there is no drive to
> > auto failover to. If the sectors on my drive fail to read, unless
> SpinRite
> > can recover them, which I've seen it do on occasion, then my only
> solution
> > is to resort to backups, which are probably out of date.
> >
> > Another, unrelated, problem is that, about half the time, that old
> computer
> > will not boot the GUI Unity interface on the live CD of Ubuntu 11.10. At
> > other times, it works. Totally insane. I hate Unity anyway, but it would
> be
> > nice to have SOME kind of GUI running. To run the recent badblocks test,
> I
> > had to resort to hitting ALT-F1 to get to a terminal and use that. The
> GUI
> > never did start up. I tried the startx command, but that didn't do
> anything
> > either.
> >
> > The saga to preserve ever so fragile data continues ...
> >
> > Sincerely,
> >
> > Ron
> >
> >
> > --
> >
> > Sent from my Android Acer A500 tablet with bluetooth keyboard and K-9
> Mail.
> > Please excuse my potential brevity.
> >
> > (To whom it may concern. My email address has changed. Replying to former
> > messages prior to 03/31/12 with my personal address will go to the wrong
> > address. Please send all personal correspondence to the new address.)
> >
> > (PS - If you email me and don't get a quick response, you might want to
> > call on the phone. I get about 300 emails per day from alternate energy
> > mailing lists and such. I don't always see new email messages very
> > quickly.)
> >
> > Ron Frazier
> > 770-205-9422 (O) Leave a message.
> > linuxdude AT techstarship.com
> >
> >
> > "mike at trausch.us" <mike at trausch.us> wrote:
> >
> > On 05/11/2012 09:09 PM, Ron Frazier (ALE) wrote:
> >> Hi Mike,
> >>
> >> I don't think that discredits the project, I think it's a wise design.
> >> Here's why. The most recent version of SpinRite came out in 2004 and has
> >> a history going back to 1988. The program designer is planning an
> >> update, but it's not out yet. Nevertheless, that version will work on
> >> modern drives. I don't know how deeply SpinRite interacts with the bios.
> >
> > If it has the limitations of BIOS, then it is using the BIOS functions
> > as published in Ralf Brown's Interrupt List (RBIL), which was (and for
> > the real-mode programmer, still is) the definitive source of BIOS
> > interrupt interfaces. The BIOS supports only a limited set of
> > functionality, which has been extended over the years to cater to larger
> > drives and so forth. It supports only limited error handling (for
> > example, INT 0x13/00 is "RESET DISK SYSTEM", but that only seeks the
> > drives to track 0, it doesn't actually perform a bus reset).
> >
> > The INT 13 interface is horrible, and unsuitable for the implementation
> > of any serious software these days. And has been for many years, which
> > is why modern operating systems no longer use that interface. Even
> > Windows 3.11 had a dedicated driver for bypassing the BIOS routines for
> > disk access (they called it "32-bit disk access" or something along
> > those lines, IIRC).
> >
> >> I do know it boots from its own copy of freedos and runs from there. The
> >> product was designed to work with any PC compatible computer, be it PC,
> >> Linux box, or Mac. It can even work with Tivo or iPod drives, etc. if
> >> you take the drive out and attach it to a PC. It needs to be able to
> >
> > Well, yes. They all speak the same language, regardless of the type of
> > computer they are plugged into. The only thing that is really different
> > between disks in a Mac and a PC is the convention used to store data
> > (e.g., partition tables or maps and filesystems).
> >
> >> have total control over the drive, including disabling some of the
> >> drive's normal error correction, so it can do analysis and detect
> >> problems. It can't have the OS in the way and interfering with it's
> >> operation. The primary target machine it was designed to run on was no
> >
> > ... but it's perfectly happy having the limited, inconsistently designed
> > various BIOS programs get in the way and do its work for it?
> >
> > I'll grant that the Linux generic SCSI interface did not exist when
> > SpinRite was first created. I'll grant even further that it was
> > impractical in that time period to actually write dedicated drivers for
> > the various disk controllers which existed at the time. The reason that
> > BIOS existed was to simplify the creation of relatively simple systems
> > so that they did not need to know anything more than the generic BIOS
> > interface.
> >
> > However, with that generic BIOS interface comes a
> > lowest-common-denominator approach to handling the disk controller and
> > therefore the disks themselves. This would also be the primary reason
> > why it's a horrible idea to use the BIOS interface.
> >
> >> doubt Windows machines. Back in 2004, those machines were running
> >> various combinations of Win 95, Win98, Win ME, Win 2000, Win NT, and Win
> >> XP. I don't think any of the Windows systems allow the kind of
> >> unfettered access to the drive that SpinRite needs. Also, as far as I
> >
> > Windows systems not built on the Windows NT kernel (e.g., Win9x and
> > earlier) do allow direct hardware access because at their core they were
> > still 16-bit operating systems. Calls to the Win32 API were largely
> > thunked to 16-bit modules that were preexisting. There were large
> > components of the system that ran in 32-bit mode, but a lot of it did
> > not. There was therefore a heavy cost associated with the constant
> > changing of the CPU mode as part of context switches and so forth, which
> > made Win9x both clunky and relatively unstable.
> >
> > Starting with Windows NT, direct hardware access is prohibited to most
> > applications. The exceptions are those that create kernel drivers that
> > enable an application to bypass such restrictions. I don't know if NT
> > has something like the Linux generic SCSI interface, but if it doesn't
> > and it were necessary for an application to be implemented, it would
> > certainly be possible to do.
> >
> > Building a utility directly on top of a SCSI interface would be far
> > superior to building it on BIOS. Building it to talk directly to
> > hardware would be the only way to get closer to the metal than what SCSI
> > interfaces would allow, but that's unnecessary for most applications,
> > and if it were necessary for an application the preferred way to do it
> > would ideally be a 32-bit extended DOS program that performs direct
> > hardware access. Of course a real-mode program would also work, but
> > there's little to no point to writing real mode code, since it's quite
> > easy to get a 32-bit DOS compiler anyway (GCC, for example, supports
> > MS-DOS via DJGPP).
> >
> >> know, there isn't a way to dismount an internal drive in Windows and
> >> work on it, as you potentially can in Linux. Even if you could dismount
> >> a drive, the system needs to run on the system drive, and the average
> >> user doesn't have any way to boot a Windows machine without doing so
> >> from the system drive. So, the best design choice was to make a product
> >
> > You can absolutely unmount drives in Windows. Use the Disk Manager in
> > the Microsoft Management Console for a GUI way to do it. There are of
> > course API calls that can be used by custom software to do it as well.
> >
> > You are right in that you cannot unmount the system drive, but that
> > problem is common to all operating systems, not just Windows. DOS had a
> > sort of exception, but DOS was also small enough to remain completely
> > resident in RAM when there was more than 1 MB of memory present and
> usable.
> >
> >> that booted itself. That way, the OS isn't running, all the drives are
> >> dismounted, and he didn't have to wonder whether the user would be able
> >> to boot their pc so they could use his software. It was probably the
> >> best solution to the problems he had to deal with. On my machines where
> >> the bios is new enough to match the hard drive capacity, I can run
> >
> > Most, if not all, modern BIOS firmware provides what are known as the
> > INT 13 extensions, which provide a portable interface to address up to
> > one or many different sizes. There was a series of progressions in the
> > maximum disk size that BIOS could support; 500 MB, 1 GB, 4 or 8 GB, 128
> > GB and I don't remember the current one but it is pretty large relative
> > to the time period it was introduced in. Maybe 500 or 800 GB or
> something.
> >
> > Anyway, given a disk that fits within the support of the BIOS, you can
> > read and write sectors by identifying the C/H/S (very old API) or LBA
> > address of the first sector and the count of sectors. The BIOS will of
> > course return an error condition in the CPU's registers if it
> > encountered an error while reading one of the sectors.
> >
> > Just as a side note, DOS called upon BIOS to do the work, but
> > well-behaved programs that were written for DOS and did not require the
> > ability to go beyond DOS' support of storage would simply call DOS
> > interrupts to get the job done, which provided a slightly higher-level
> > API since you did not need, for example, to worry about sectors but
> > instead clusters in named files.
> >
> > SpinRite doesn't need to use any of the DOS filesystem facilities,
> > though, and DOS is inherently a single-tasking operating system, so it
> > is safe for SpinRite to assume that it can have exclusive control of the
> > disk while it is running.
> >
> > Also note that there are DOS implementations that have Windows NT-like
> > restrictions on direct hardware access for certain things. Such
> > versions are usually ones that provide some sort of task switching or
> > multitasking ability, for example taking advantage of the functionality
> > and features of the 386 or newer CPUs and providing DOS applications
> > with a V86 environment instead of a real one.
> >
> >> SpinRite on both the Windows partitions and the Linux partitions. It
> >> doesn't care. It works strictly at the sector level and is non
> >> destructive. Even to use the badblocks command as you and Jim have
> >> suggested on my old PC, I have to shut it down and boot a foreign OS, ie
> >> a live Linux CD, in order to run the test. That's exactly the same thing
> >> SpinRite is doing. It just happens to be booting freedos rather than
> >> Linux.
> >
> > No, you can do it while the system is running, you just cannot use the
> > read-write test mode.
> >
> > The read-write test mode in badblocks is superior because it does not
> > depend on the data currently stored in the sector. Certain data
> > patterns can hide error conditions which may exist on the platter; for
> > example, a particular bit may be stuck "on" or "1", but you'd never know
> > that if the value that is there is legitimately 0xFF. But you'd detect
> > it if you wrote 0x00 there, and when you read it back it was, for
> > example, 0x01 or 0x80, because the stuck bit didn't get cleared.
> >
> > SpinRite does this at the disk sector level (presumably, since it is an
> > ancient program, with a fixed value of 512 for the sector size). The
> > badblocks command works on blocks, too, but you can specify the size of
> > what it considers a block. A common value is 4,096 bytes for a block
> > when running badblocks, though virtually any block size that is a
> > multiple of 512 will work for older drives; make that a multiple of
> > 4,096 for modern so-called "advanced format" drives.
> >
> >> I may run the non destructive rw test on the old pc using badblocks as
> >> you and Jim suggested in other messages. It already passed the long
> >> smart test and it says the drive is healthy with no bad sectors. I just
> >> have to figure out how much additional time I want to spend on it.
> >
> > For a non-critical (e.g., personal) system, SMART should be sufficient.
> > You of course take regular (monthly or weekly) backups of your ${HOME},
> > right? If that's the case then recovery is possible within hours of new
> > drive installation, and for an individual, particularly one who has
> > multiple computers, that is an acceptable thing. If not, you can employ
> > other steps to try to delay or defer the restoration process, such as
> > using RAID, but backups are backups and still quite necessary.
> >
> > I have one array I manage that it would take approx. 30 hours to restore
> > from backup. In an attempt to avoid that in all but the most
> > devastating situations, I have it on a RAID array. As long as the RAID
> > array's health is maintained, it is possible for me to keep taking and
> > testing backups and knowing that I can, if need be, recreate the
> > configuration as it exists in the office today along with all the data,
> > but I don't have to if only one or two drives fail, because I can
> > replace them almost immediately upon failure. I'm a bit on the paranoid
> > side, too, I will start replacing drives as soon as they stop behaving
> > 100% perfectly as opposed to wait for a hard failure.
> >
> > So far that seems to have been a good way to ensure that things stay
> > running... the drives I've removed were all used again in my home
> > (after, of course, being wiped) and failed within a month of removal
> > from the array. One of them failed only two days after removal. Even
> > more than smart, the Linux kernel ring buffer is a great source to
> > monitor for disk trouble, especially on a system that has many disks
> > with lots of activity on them. The kernel will notice errors as soon as
> > they are encountered.
> >
> > Additionally, RAID devices get "scrubbed" once per month anyway (at
> > least by default on Debian and Ubuntu). The "scrub" process is
> > *exactly* what SpinRite does, reading everything on all the disks. It
> > doesn't re-write every sector, but it doesn't need to: if a sector
> > cannot be read it is reassembled from parity information and an attempt
> > is made to re-write it. If the drive was able to re-map the sector, the
> > write will work and things continue. If the drive was unable to re-map
> > the sector (say, because it ran out of sectors in the spare sector area)
> > then the write will most likely fail and the disk will be marked
> > "failed" by the virtual RAID controller.
> >
> > That's robust enough for me, at least with the requirements I have in
> > the environments that I am managing for the moment. I would like,
> > however, to have a beefier system for the RAID. Not because of the CPU,
> > but because it would really do well to have a veritable buttload of RAM.
> > A lot of the operations would be faster if the system had 4 GB of RAM
> > to use for caching and buffers...
> >
> >       --- Mike
> >
> > --
> > A man who reasons deliberately, manages it better after studying Logic
> > than he could before, if he is sincere about it and has common sense.
> > --- Carveth Read, “Logic”
> >
> > _____________________________________________
> >
> > Ale mailing list
> > Ale at ale.org
> > http://mail.ale.org/mailman/listinfo/ale
> > See JOBS, ANNOUNCE and SCHOOLS lists at
> > http://mail.ale.org/mailman/listinfo
> >
> >
>
>
> --
>
> Justin Goldberg
>
> *justgold79 at gmail.com*
> (504) 208-1158
> http://gplus.to/goldberg
> http://twitter.com/justingoldberg
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20120515/4eb13e66/attachment-0001.html 


More information about the Ale mailing list