After hunting round for a easy way to track the server's state I came across a simple script from Craig Edmonds that did the job. It very simply generates an email containing a variety of status information, including most importantly the process list data from Top, and sends that to you in an email. By scheduling the script to run every minute you get a snapshot of the servers state at regular intervals. Since the subject line includes the load average you can easily look through the messages, spot those times with a high load, and see what the server is doing.
In my case I adjusted the script slightly to include $todaydate in the subject line, since the above issue with Exim meant I couldn't always rely on the message being received in the correct order.
There was one problem I found with this solution. The script runs a single iteration of Top and inserts the output of that into the email, however as you can see from the Man page for Top :
The top command calculates Cpu(s) by looking at the change in CPU time
values between samples. When you first run it, it has no previous sam-
ple to compare to, so these initial values are the percentages since
boot. It means you need at least two loops or you have to ignore sum-
mary output from the first loop. This is problem for example for batch
mode. There is a possible workaround if you define the CPULOOP=1 envi-
ronment variable. The top command will be run one extra hidden loop for
CPU data before standard output.
each email I received had identical CPU data. While I could tell the server's load was high, I couldn't see what the state of the processor was at that time. I didn't fancy messing around with environment variables, so instead opted for a solution found here, and adjusted the line calling Top as follows :
$process_list = shell_exec('top -b -n2 | awk "/^top/{i++}i==2"');
So I found a simple and easy way to track what's happening on the server, though of course with an email a minute it's not something I'll be running long term.
I wish I could say that was the end of it, but unfortunately this turned out to be the beginning of my struggle and confusion, brought on in no small part to the number of confused explanations of Load Average operations, but I'll discuss that in my next post.
References :
http://www.unix.com/gentoo/77494-top-batch-mode-cpu-info-wrong.html
No comments:
Post a Comment