[svlug] New server plans moving forward

Rick Moen rick at linuxmafia.com
Sun Jan 21 19:56:29 PST 2007

Quoting Jeff Frost (jeff at frostconsultingllc.com):

> That also means that you could lose half the array and keep right on 
> running, assuming you only lost 1 drive from each RAID1 pair.  It seems 
> unlikely that would happen, but it does give better odds than keeping a 
> large RAID5 array happy.  RAID5 also tends to perform very poorly when 
> degraded.

Let me get to all of that, below.

> NOTE: I come at this from a performance background, in case you note a bias 
> towards performance.  I understand that sometimes you just need the space. 
> :-)

Yes, and that's a good point -- but this is just a Web and mail server,
so I'd be flabbergasted if it were ever constrained by disk I/O when
running any non-degraded "md"-driver RAID setup of any kind.  My
experience with machine scaling, which has included a heck of a lot of
VA Linux 2200 series Web server ;-> , suggests that this machine will
bottleneck on network I/O or RAM, in all likely usage scanarios (unless
we suddenly turn it into a POVtrace farm box, or something).

In my experience, typical commodity Linux Web/mail servers in typical
deployments never even feel any net slowdown (bottleneck) from _any_
operating mode of the "md" driver, not even RAID6 double-parity mode,
except during remirroring.

Now:  The existing RAID 1+0 setup _indeed_ can survive loss of half the
drives prior to replacement and rebuild, but only if you're
unbelievably lucky about which drives fail:  4/8 * 3/7 *
2/6 * 1/5 = a 1.4% likelihood that exactly the right drives fail to save
your ass.  To quote Damon Runyon, "That's not the way to bet."

Anyway, you think we have less than a 1.4% chance of keeping a large
RAID5 array happy?  Man, you must think we're real slackers.

If I were running the 7-drive RAID5 array of 36GB SCSI drives I spoke
of, with one hot-spare in the eighth drive slot, and one failed, (1)
mdadm would send me mail about the drive failure and remirror onto the
spare.  (Disk I/O and CPU load would suck a bit during that remirror --
the price of doing the more complex varieties of "md" software RAID.)
(2) Within a day or two, I'd scrounge a replacement drive, cycle over to
the colo, bring down the array, replace the failed drive, bring up the
machine, and add the replacement as the new hot spare.

You'd do differently because you're a performance freak?  Bully for you,
and best of luck with those 1.4% odds.

Cheers,             We write precisely            We say exactly
Rick Moen           Since such is our habit in    How to do a thing or how
rick at linuxmafia.com Talking to machines;          Every detail works.
Excerpt from Prof. Touretzky's decss-haiku.txt @ http://www.cs.cmu.edu/~dst/

More information about the svlug mailing list