[svlug] Partitioning problem
Jeffrey Siegal
jbs at quiotix.com
Tue Feb 7 17:13:43 PST 2006
On Feb 7, 2006, at 16:06 , Rick Moen wrote:
> The drives being SCSI makes a bit of difference, in small part because
> of statistically holding up better to 24x7 service
True.
> in larger part
> because of the SCSI hotfix area. (You may be fully aware of this;
> I'm mentioning it for those who don't know.) There's a set of
> reserved
> sectors under the drive electronics's control, OS-transparently
> swapped
> into service during runtime, whenever the drive electronics detects
> failing sectors. (If this has somehow ceased to be the case, I might
> not have heard, but I doubt it.)
I'm pretty sure that modern ATA drives also do hotfixes.
But in any case, I've have numerous cases of failing drives
(including SCSI drives) deciding to fail in ways that hotfix won't cure:
1. Read errors, because the sector went bad after it was written, and
can't be recovered from ECC. These are not hotfixable, and will
always cause an error to be returned to the driver/OS. In the case
of RAID, the system will attempt, almost always successfully to
recover the data from the other mirror. You can then rebuild the
pair, and the bad sector will be replaced upon write.
2. The drive spontaneously resets and/or spins down. I can't
entirely explain why this happens, but I've seen it on multiple
systems. I suspect the causes may be different, encompassing driver
bugs, driver firmware bugs, bad power supply, and drive circuit board
failure. But in any case, RAID will often recover from this.
(Although software RAID is not guaranteed to recover from this sort
of failure, in practice it usually does.)
> In any event, I would estimate the likelihood of swap-partition
> problems
> caused by flaky drives -- that aren't _also_ part of larger system
> problems -- as really low.
In practice the drives -- even SCSI drives -- are probably the least
reliable part of any system (power supply being the potential
challenge).
But like I said, if you're putting critical system partitions on non-
RAID, then I wouldn't worry about putting swap there as well. I
personally just don't configure systems that way any more -- I find
the ability to repair a broken drive by simply replacing the unit
(and not having to go through a system recovery process) to be worth
the cost of an extra drive.
BTW, one thing to keep in mind is that RAID-1 will reduce write
performance a bit, but will generally *improve* read performance,
since there are two drives across which reads can be scheduled. For
anything other than database applications, read performance is
_usually_ much more important than write (because writes are usually
asynchronous but reads are synchronous). So in terms of overall
performance, usually going with RAID-1 will have a small performance
benefit.
More information about the svlug
mailing list