[volunteers] Mailman Web pages (but not Mailman SMTP) has been screwed up

Paul Reiber reiber at gmail.com
Wed Nov 21 03:46:39 PST 2007


Rick - I love ya, buddy...  but It's wild how things can look totally
different on the other side of the coin.  You make it look like I blew
off my responsibilities for two days.  Reality is, any one of a half
dozen people could have gotten off their asses and visited via.net and
gotten the server rebooted.

I get people telling me the president's job is everything from making
sure we have pizza at the meeting through ensuring the doors are
unlocked through making sure there are some hot vegetarian breakfast
bagel sandwiches available to go along with the with-meat variety...
so your expectations aren't surprising... but they're certainly not
reasonable.

> Relevant to that, let's review for a second:   Thanks to your complete
> and total inaction, your cancelling the volunteers' ongoing server
> migration plan without either consulting or telling them, and your
> adopting a secret plan in its place, we've been unable to move off
> frighteningly fragile hardware and software -- mitigated only by my and
> Mark's independent measures to make sure we at _least_ have data
> backups.

I've stopped no-one from implementing ANYTHING.  I don't know what
"ongoing server migration plan" volunteers (who?) were working on but
I surely wouldn't stop anything like that from moving forward.  SVLUG
remains a do-ocracy - those who do, will have done.

Go buy and install a new server tomorrow, and we'll all thank you.
Or, help make some of the problems you've identified go away some
other way... and we'll thank you for that as well.  But just sitting
there and pointing out the problems... that I can do without.

>  In the middle of that situation, the server drops offline --
> but still occasionally responds to ping, making your bad guess that
> someone had pulled power to the colo rack particularly laughable.  After
> a day of _further_ inaction on your part (though you said you'd visit
> the evening of the 19th or morning of the 20th, but then didn't bother),
> we still have zero idea what's wrong.  Pretty much all we know is that
> _your_ idea is the one explanation that cannot be correct.  We have no
> idea whether, e.g., there's been a motherboard hardware failure (e.g.,
> bad ethernet chip), and for all we know, the server might end up being
> offline for weeks.

You make it sound like the "we" above is a paying customer.    You
know better than to honestly say we could be offline for weeks -
Mark's got backups, and worst case, we could fire up a new SVLUG
server using Google's services or any of a number of other sponsor
servers, in a matter of days, if not hours.

You also seem to be saying that you knew the machine was down for the
entire two days, yet didn't once email me at paul at reiber.org to let me
know?

Also - I'm not your answer man.  If I make a guess as to what might be
wrong, that guess may or may not be spot on.  If it's NOT, that does
NOT give you license to rip into me and show me where I didn't get
things just right.  And if/when you do know the answer, you damned
well shouldn't be asking as if you don't - just to see what others
will answer.  Instead, just pony up that right answer and move on.

> Against that background:  You take NO measures to tell the membership
> what's going on (except the nine members of this Google Group, to whom
> you told your bad guess).  None.  Hell, you don't even bother to visit
> the Freenode #slvug and say anything, let alone set the channel's /topic
> to let members know what was happening.

I don't use IRC much.  Deal with that.  And again it's not my job to
keep everyone "posted"... no-one keeps _me_ posted, so WTF?

> So, I discuss the situation with VP Mark Weisler and _attempt_ to do so
> with you, via attempts to call your cellular number (which you've
> disconnected) and finally, that not reaching you, a posting here.  I
> then work hard to quickly bring online an exact replica of the
> production Web pages, and point the DNS to it with carefully shortened
> TTL values so that it can be quickly repointed if/when the old server
> gets revived.

Neither one of you guys considered going over to via.net as a
reasonable first course of action???  Mark has my blackberry number -
as does Heather - and others... none of this makes much sense in light
of these things.

You see, the current static website's not all that useful (sans
information about, say, this week's events) - the mailinglists are
considerably _more_ useful, at least in that we communicate using them
- but your "exact replica" wasn't doing mailman, was it?

So your heart was in the right place, but the effort was misguided and
unnecessary.  I'm sorry, but from my perspective it created more
problems than it solved, plain and simple.

> So, finally, _no_ thanks to you, that measure prevents, for the duration
> of the outage -- however long that might be -- our members from thinking
> we've just dropped off into a black hole.  They're able to see the
> explanation that I post and keep updating at the top of the front page's
> SVLUG News column.  Also, while I'm in the middle of taking care of
> basics that you can't be bothered with, I adjust the #svlug channel's
> /topic header twice to reflect the latest news, and keep our members
> informed.

So who really cares if some of our members think we "dropped into a black hole"?
You keep treating the membership like a customer or client - it's not!
 It's SO SO irrelevant if even the most important LUG in the world
goes offline for days.  It really is.  A LUG is not a business!  It
can go offline, and no-one should get worried!

All of the above is the best evidence I've seen so far for migrating
SVLUG off of its own servers and onto Google or Untangle or Linode or
wherever all we decide to host things.


> Eventually, after blowing off the problem -- and our membership -- for
> two days, you finally mosey down to Via.net.  You "fix" the problem.
> And what's your idea of "fixing" it?  You do NO root-cause analysis
> wiatsoever, but merely reboot the box!

That's SO misleading it's not even funny.

-> no blow-off occurred; SVLUG got prioritized, that was all.
-> "mosey" == stop-and-go 101 traffic
-> I had to get in, get it fixed, and get over to the Googleplex to
round up lost SVLUGgers for this week's kernel walk-thru session since
the building changed from last week.

I wish I could have spent all afternoon figuring it out... but I have
paying clients.  Like I said, the magic sysreq keystrokes were but
moments away.  BUT, I took the shorter, more reasonable route - get it
working again, and move on.

Heather's been busting hump on Brie, and we'll be enjoying the fruits
of her labor soon enough.  I'm not planning on firing up the JBOD -
too much power consumption; we'll get a few .5T IDE drives instead,
and have more space in the end.  Virtualization and MoinMoin (and
probably using Untangle as well) is all right around the corner.

-pbr
-Paul




More information about the volunteers mailing list