[svlug] Partitioning problem

Jeffrey Siegal jbs at quiotix.com
Tue Feb 7 17:13:43 PST 2006


On Feb 7, 2006, at 16:06 , Rick Moen wrote:

> The drives being SCSI makes a bit of difference, in small part because
> of statistically holding up better to 24x7 service

True.

> in larger part
> because of the SCSI hotfix area.  (You may be fully aware of this;
> I'm mentioning it for those who don't know.)  There's a set of  
> reserved
> sectors under the drive electronics's control, OS-transparently  
> swapped
> into service during runtime, whenever the drive electronics detects
> failing sectors.  (If this has somehow ceased to be the case, I might
> not have heard, but I doubt it.)

I'm pretty sure that modern ATA drives also do hotfixes.

But in any case, I've have numerous cases of failing drives  
(including SCSI drives) deciding to fail in ways that hotfix won't cure:

1. Read errors, because the sector went bad after it was written, and  
can't be recovered from ECC.  These are not hotfixable, and will  
always cause an error to be returned to the driver/OS.  In the case  
of RAID, the system will attempt, almost always successfully to  
recover the data from the other mirror.  You can then rebuild the  
pair, and the bad sector will be replaced upon write.

2. The drive spontaneously resets and/or spins down.  I can't  
entirely explain why this happens, but I've seen it on multiple  
systems.  I suspect the causes may be different, encompassing driver  
bugs, driver firmware bugs, bad power supply, and drive circuit board  
failure.  But in any case, RAID will often recover from this.   
(Although software RAID is not guaranteed to recover from this sort  
of failure, in practice it usually does.)

> In any event, I would estimate the likelihood of swap-partition  
> problems
> caused by flaky drives -- that aren't _also_ part of larger system
> problems -- as really low.

In practice the drives -- even SCSI drives -- are probably the least  
reliable part of any system (power supply being the potential  
challenge).

But like I said, if you're putting critical system partitions on non- 
RAID, then I wouldn't worry about putting swap there as well.  I  
personally just don't configure systems that way any more -- I find  
the ability to repair a broken drive by simply replacing the unit  
(and not having to go through a system recovery process) to be worth  
the cost of an extra drive.

BTW, one thing to keep in mind is that RAID-1 will reduce write  
performance a bit, but will generally *improve* read performance,  
since there are two drives across which reads can be scheduled.  For  
anything other than database applications, read performance is  
_usually_ much more important than write (because writes are usually  
asynchronous but reads are synchronous).  So in terms of overall  
performance, usually going with RAID-1 will have a small performance  
benefit.






More information about the svlug mailing list