[svlug] Inits (was: On the process of picking up systemd)

Steve Litt slitt at troubleshooters.com
Thu Jan 15 22:52:22 PST 2015


On Thu, 15 Jan 2015 20:52:52 -0800
Rick Moen <rick at svlug.org> wrote:

> Steve Litt wrote:
> 
> > That 95% will be perfectly happy with systemd. For those of us who
> > aren't, there are plenty of reasonable alternatives.
> 
> I hope you won't be offended at my saying this, but I'm a little
> disappointed at seeing _no engagement with the point I raised_ about
> what we agree is (borrowing your well-phrased characterisation) 'the
> kernel's increasingly event-driven nature'.
> 
> To review:  Up until about four years ago, everyone on Linux was
> relatively happy with sequential-code init systems, particularly
> those that included dependency checking - including but not limited
> to SysVInit, runit, BSD init, nosh, s6, and the like.  
> 
> What I, probably you, and almost everyone else other than a few
> specialists was late to realise is the excalating problems (spiking
> in prominence starting around the beginning of this decade) caused by
> kernel behaviour becoming variable over time and across reboots,
> hardware status becoming more dynamic, and all of these state changes
> being tracked by kernel 'uevent' data accessible via the netlink
> socket and userspace hotplug processes.
> 
> Even though I, and perhaps also you, prefer to use no hotplug code
> (and to avoid problem-causing scenarios like mounting /home from an
> external USB drive), the problem remains:  Some very intelligent and
> experienced people have advised me that the dynamic nature of kernel
> events can now be ignored only at one's peril, and that
> event-handling code is necessary or else one day I'll find that eth0
> and eth1 have swapped on reboot for mysterious reasons, or worse
> things will happen.
> 
> The point is:  Are they right, or are they wrong?
> 
> Everything I've been reading from relevant technical sources, as I
> catch up on this subject, suggests they're right.  Which then would
> mean that SysVInit, runit, BSD init (on Linux), nosh, s6, and the
> like are _not_ quite sufficient even to reliably boot systems with
> all hardware initialised and services requiring that software running.
> 
> 
> Here's a 2012 posting by one of those people who really can speak
> authoritatively on the subject:  Petter Reinholdtsen, one of the
> maintainers of Debian's SysVInit.  Note his comment that _only_ the
> early portion of init, in which hardware and filesystems gets set up,
> needs to handled by event-based code.  Which is a logical alternative
> to recent approaches that to my knowledge has not yet been explored.  
> 
> 
> I'm going to quote Reinholdtsen's full text, as I think he's spot-on:
> 
> https://lists.debian.org/debian-devel/2012/02/msg01043.html

Hi Rick,

You have me at a disadvantage because I don't think I've ever seen a
system boot indeterminately, or at least not in the last 10 years. I've
never seen eth0 and eth1 switch, and the machine I've been doing all
this testing on has both.

If all this kernel event stuff is that indeterminate, we have bigger
fish to fry, because the initramfs does a lot of stuff long before it
knows whether /sbin/init is Epoch, runit, systemd, or /bin/bash. It
takes my Epoch/CentOS box 4 seconds to go from Grub to the beginning of
Epoch (and another 4 to boot up to CLI).

I can't really test this, because I can't reproduce an indeterminate
situation (admittedly, it never occurred to me to have /home on a thumb
drive). If I ever *did* see something like this, my first test would be
to put a 10 second sleep early in the init. I'm thinking my kernel
should have settled down by then. And with my use case (and a heck of a
lot of other peoples' use cases), an 18 second boot is no less
convenient than an 8 second boot.

If the kernel took *really* long to get something stabilized, I'd start
asking myself why, and treat that as a root cause, not a fact of life
that must be engineered for. For instance, I've had cases where certain
daemons took minutes to start, and the root cause was bad reverse DNS,
and the fix was to fix my bad reverse DNS.

So in other words, OK, I'll stipulate the kernel is indeterminate, and
race conditions abound, even though I personally have seen no evidence.
But races take a finite time to run, and in most use cases I'm thinking
that race is over before the initramfs is done. And if anybody still
has a distro that boots straight up without an initramfs (just gotta
love /sbin being a symlink, right?), well, an early sleep will probably
do the trick.

Now of course, if I were designing a system to go onto a flight to
Mars, I wouldn't leave this stuff to chance, and would gravitate toward
an event driven. But in the end, it's a cost/benefit thing, with
consideration given to likelihood of something going wrong, and how
much havoc that wronggoing would create.

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance




More information about the svlug mailing list