[svlug] System Panic Makes My Life Easier

Ivan Sergio Borgonovo mail at webthatworks.it
Thu Jul 28 03:12:22 PDT 2016

On 07/26/2016 07:56 PM, Joseph Brenner wrote:
> I've been puzzling over a sick linux box for a little while lately.
> It's a dual-Opteron box (over ten years old now? wow...) that I've been
> upgrading off-and-on (bigger disks, a new video card...),
> but after a recent round of software upgrades it had been
> acting incredibly flaky, with uptimes of only a few days.  It would
> totally lock-up and require a hard reboot... couldn't even ssh into it.
> I was trying to get an idea of what software change could've caused
> this problem-- the list of possibilities was long-- but of late the
> problem has gotten far worse, and it throws system panics and won't
> boot at all, so it's almost certainly a hardware problem. The cpu fan
> has been in bad shape for some time... it's got some cooling now, but
> I can easily imagine it's lifespan was shortened by overheating in the
> past.

If it can't even start grub most probably it is an hardware problem.

If it can run grub and freeze when the boot is still not finished + it 
hangs at random moments even after boot there are still high chances it 
is a hardware problem or a kernel problem or at least something that 
start very early and always keep on running.

Cleaning dust inside and just checking if everything is still at the 
right place can do miracles. Every miracle should be followed by a 
backup ex-voto ;)

If you suspect a software regression you've to look at the intersection 
of recently updated packages, things that run most of the times and 
things that can hang the whole system.

Generally the most probable culprits are:
- things that capture your input (eg. X, try to ssh into the box)
- things that get crazy and use all your CPU (top -b > log or any better 
- things that use hardware

If you're in X try to switch to console.
If you're on debian kdump-tools could help diagnose the problem.
Read dmesg, syslog, X logs.

Since what you're experiencing happens at random moments on a pretty 
long interval picking up the right thing to log and how frequent is 
going to be challenging. A good start could be to log program load so 
you can check what was running immediately before the crash and if it 
was taking too much resources or waiting input.

>   btrfs
>   systemd

>   Debian amd64 binary packages (the same for intel and amd? Um...)

^^^ ???

> And we might throw in firefox, which I've got running pretty often
> and loves to torture it's users with automatic upgrades.

Possibly automatic upgrades of extensions/plugins.

> In the old days, my first guess actually would've been that linux
> itself is rock solid, but I'm afraid linux has seemed increasingly
> flaky of late.  I've seen this hard lock-up symptom on a number of
> thinkpads, particularly when running a media player like vlc or totem.

I suspect because now they rely more and more on GPU drivers... but no, 
not here anyway.

I'm surprised how rare have become issues with sid in the past years and 
most of them could be fixed just with a downgrade and waiting a couple 
of days the new fixed version. And I'm a compulsive updater.

Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net

More information about the svlug mailing list