[svlug] Files That Will Not Delete

Rick Moen rick at svlug.org
Mon Nov 3 15:33:07 PST 2014

Scott wrote:

> SMART Error Log Version: 1
> No Errors Logged
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed without error       00%     32498

It's vaguely nice that the drive's internal SMART system thinks there are no
errors, that the last SMART self-test at 32498 hours into the drive's
expected lifetime came up clean, and that SMART thinks 00% of the drive's
lifetime has been used up.  However, Greg Lindahl dredged up the thing that 
I was trying to remember:

> Last time I got a disk guy drunk and pumped him for secrets, he told
> me that 20% of the disk surface is reserved for bad blocks, and that
> it is completely normal to have lots of bad blocks. They mostly get
> covered up by the drive firmware, but, if you get unlucky you'll end
> up with a bad block that can't be read that the drive thinks has
> important data on it. This will show up as an uncorrectable read
> error.
> [...]
> Running badblocks is useless, starting about 15 years ago. The drive
> should prevent badblocks from ever finding a badblock other than an
> uncorrectable read error. 20%, remember? 


Basically, the drive manufacturer is in a position to hide reality behind
the firmware, and they all do.  The tested bad blocks are a lie, because 
the drive ROMs are invisibly swapping out blocks for reserved blocks 
whenever there's a correctable softerror, i.e., a sector that's failing but
the ROMs were able to copy the data off before it failed.  But you could
have a developing avalanche of block failures right up to, and then past,
exhaustion of the reserved blocks, and never know this until suddenly all
the new ones start showing and being a big problem.

Likewise, the firmware controls what data are exposed to SMART monitoring,
so all of that behind-the-scenes remaping to the limit of the reserved
sectors is completely invisible to SMART.

As Michael said, you can optionally download manufacturer-specific software
to test hard drives.  The (very) old versions, many moons ago, used to also
include 'pseudo-low-level formatting' utilities to zero out the basic track
information, but that went out with the dodo when ATA ('EIDE') came in,
except on SCSI.

(Thus, if you see software that purports to do that in 2014, be suspicious
and don't try it.)

Follow Michael and Greg's advice on this.  If the SMART data are clean and 
badblocks is clean, that really doesn't mean much other than that the
firmware has successfully hidden from you any ongoing badness.  At that
point, I'd just zero out the partition table ('dd if=/dev/zero of=/dev/sdd
bs=512 count=1'), make a new partition table with whatever you use, and mkfs
yourself a new partition.

If running smartctl ever _does_ start showing uncorrectable errors, break
all land-speed records making backups.

So, the first rule of
Haiku Club is you don't talk
about Haiku Club.

More information about the svlug mailing list