[svlug] CPU load and real world value

Michael Eager eager at eagercon.com
Wed Mar 23 16:53:02 PST 2016


Interesting question.

Is this something which happens periodically or at the same time every day?
If so, I'd look for something which is started by cron.  I have rsnapshot
run every four hours and I sometimes notice slower response time when it
is backing up files.  Mdadm, the RAID disk manager, will periodically check
RAID disk systems, using a fair amount of CPU.  You can see this with "cat
/proc/mdstat".

You don't mention if this a problem on a server or desktop system, or what
the system is running.  Some desktop systems have programs which monitor
files for changes so that they can be searched easier.  Some are buggy and
don't stop looking when they don't find any changed files.  Some servers
or database managers clean up databases periodically.  There are a few
programs which are known to use a lot of CPU, generally looping, such as
the Firefox flash plugin.

Can you see different programs running when this happens, or one program
which is suddenly using an unexpected amount of CPU?

You might run ps every five seconds to list the top CPU user, then see
if you see a pattern. 
http://stackoverflow.com/questions/15103662/command-for-finding-process-using-too-much-cpu

On 03/23/2016 04:36 PM, Robert Freiberger wrote:
> Hello Michael,
>
> Thanks for the reply and I have used top plus the other variants in the past. The issue I should
> have explained clearly in the first e-mail is finding the root cause for high CPU load and if it's
> really a valid measurement of system performance. At my job, we have a majority of alerts that are
> focused on application metrics (latency and api calls), system level (free memory, CPU load, disk
> space), and finally synthetic checks (user simulation).
>
> We will semi-frequently find CPU load spikes on the host, but nothing else is alerting. Taking a
> look at the load trending, through sar, it's jumping from 1~4 to 20~30 within minutes, holds this
> for about an hour, then drops back to the previous levels. While I check the various metrics from
> our monitoring, I do not see anything that is obvious to the rise.
>
> My knowledge within the cpu calls and how the system works is very limited, wonder if there is a
> good reference I should look for as a starting guide?
>
> Thanks,
> Robert
>
> On Sun, Mar 20, 2016 at 1:05 PM Michael Eager <eager at eagercon.com <mailto:eager at eagercon.com>> wrote:
>
>     On 03/18/2016 04:48 PM, Robert Freiberger wrote:
>      > Hello,
>      >
>      > I still consider myself pretty new to the world of UNIX/Linux, and find that when I
>     investigate an
>      > issue with CPU load, it's very difficult to trace the issues. Unlike performance problems with
>      > network or NFS, where I can test latency with simple commands, load appears to be much harder to
>      > test in real time.
>
>     Are you familiar with "top" or the graphical variant "htop"?
>     Both will give a real-time display of current system load
>     and activity.  KDE and Gnome have graphical monitors for CPU
>     utilization and other activity, such as memory or network use.
>
>     Tecmint.com has an article about 20 different tools which can
>     be used to monitor Linux performance.
>
>      > Is there any recommendations how to really investigate this and how effective is CPU load to a
>      > systems health?
>
>     It's not clear what you are investigating.  Do you have a problem
>     or are you simply curious?
>
>     High or low CPU utilization is not a cause or solution to poor
>     performance.  If you are running at 80% CPU, that means that the
>     CPU is idle 20% of the time.  Any program which is ready to run
>     while the CPU is idle will be dispatched.  On most systems, reducing
>     CPU load to 60% will not make programs run faster, it will just increase
>     your idle time.  (This may not be true if you are running CPU-intensive
>     programs like transcoders or video editors.)
>
>     More interesting is load averages displayed by top/htop and
>     uptime.  This is the average number of runnable processes which
>     are waiting at any particular time.  High load averages result
>     in poor responsiveness.
>
>     --
>     Michael Eager eager at eagercon.com <mailto:eager at eagercon.com>
>     1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
>


-- 
Michael Eager	 eager at eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077



More information about the svlug mailing list