Wednesday, January 13, 2010

Troubleshooting a high system CPU usage issue on Linux/Solaris

A Linux server has high %system CPU usage, following are steps to find the root cause of the issue and how to resolve it.
Vmstat show %system CPU usage is high.
# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

1  0      0 191420   8688  35780    0     0     0     0 1006   31  1  4 96  0  0
1  0      0 124468   9208  98020    0    0 15626  2074 1195  188  0 76  0 24  0
0  1      0 110716   9316 110996    0    0  3268  4144 1366   84  0 94  0  7  0
0  3      0  97048   9416 122272    0    0  2818 11855 1314  109  1 80  0 20  0
0  4      0  80476   9544 137888    0    0  3908  2786 1272  172  0 54  0 46  0
2  1      0  72860   9612 145848    0    0  1930     0 1193  141  0 42  0 58  0
0  1      0  74300   9620 145860    0    0     0     6 1208   67  0 38  0 62  0
0  0      0  75680   9620 145860    0    0     0  6929 1364  101  0 70  6 24  0

Let’s run mpstat to show more detailed CPU usage,it showed CPU was busy with interruptions.

# mpstat 2
Linux 2.6.18-92.el5 (centos-ks)         01/14/2010

02:03:50 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
02:04:04 AM  all    1.33    0.00   41.78    0.00    0.44    3.56    0.00   52.89   1015.56
02:04:06 AM  all    0.00    0.00    8.04   38.69   29.65   23.62    0.00    0.00   1326.63
02:04:08 AM  all    0.00    0.00    8.70   30.43   27.54   28.50    0.00    4.83   1327.54
02:04:10 AM  all    0.00    0.00    5.47   46.77   27.36   20.40    0.00    0.00   1280.10
02:04:12 AM  all    0.50    0.00    6.47   63.18   19.40   10.45    0.00    0.00   1183.08
02:04:14 AM  all    1.01    0.00    6.53   62.31   21.11    9.05    0.00    0.00   1190.95
02:04:16 AM  all    0.00    0.00    8.04   26.63   43.72   21.61    0.00    0.00   1365.83
02:04:18 AM  all    0.00    0.00    1.50    0.00    0.00    0.50    0.00   98.00   1006.50
Use sar to find out which interrupt number was culprit. #9 was the highest excluding system interrupt #0.
# sar -I XALL 2 10
02:07:10 AM      INTR    intr/s
02:07:12 AM         0    992.57
02:07:12 AM         1      0.00
02:07:12 AM         2      0.00
02:07:12 AM         3      0.00
02:07:12 AM         4      0.00
02:07:12 AM         5      0.00
02:07:12 AM         6      0.00
02:07:12 AM         7      0.00
02:07:12 AM         8      0.00
02:07:12 AM         9    350.50

[ Solaris equivalent command]
Solaris# intrstat 2 

device |      cpu0 %tim      cpu1 %tim 
-------------+------------------------------ 
bge#0 |         0  0.0       128  0.6 
cpqary3#0 |         0  0.0        14  0.0
# cat /proc/interrupts
CPU0
0:     702980          XT-PIC  timer
1:        439          XT-PIC  i8042
2:          0          XT-PIC  cascade
6:          2          XT-PIC  floppy
8:          1          XT-PIC  rtc
  9:      14464          XT-PIC  acpi, eth2
11:         12          XT-PIC  eth0
12:        400          XT-PIC  i8042
14:       6091          XT-PIC  ide0
15:         22          XT-PIC  ide1
NMI:          0
LOC:     700623
ERR:          0
MIS:          0
[ OpenSolaris equivalent command ]
Solaris#echo ::interrupts | mdb –k
Native Solaris has to search the interrupt from output of prtconf -v
Solution:
When the card transmits or receives a frame, the system must be notified of the event. If the card interrupts the system for each transmitted and received frame, the result is a high degree of processor overhead. To prevent that, Gigabit Ethernet provides a feature called Interrupt Coalescence. Effective use of this feature can reduce system overhead and improve performance.

Interrupt Coalescence essentially means that the card interrupts the system after sending or receiving batch of frames. 

you can enable adaptive moderation ( Adaptive RX: off  TX: off) to let system choose value automatically or set individual values manually.

A interrupt is generated by the card to the host when either frame counter or timer counter is met. Values 0 means disabled.

RX for example:
Timer counter in microseconds: rx-usecs/rx-usecs-irq
Frames counter:rx-frames/rx-frames-irq

# A sample output with default values.
# ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 18
rx-frames: 6
rx-usecs-irq: 18
rx-frames-irq: 6

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 80
tx-frames-irq: 20

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
<>
[ Solaris equivalent command]
Varying on driver, Find out the driver's capability. 
Solaris#ndd -get /dev/e1000g0 \? | egrep ‘interrupt |intr’
The Value shoud be set in driver conf file:
Solaris#/platform/`uname -m`/kernel/drv/*.conf
Alternative Workaround:
I couldn't config Interrupt Coalescence because virtual machine NIC didn't support it. but as workaround, Increasing mtu can also decrease interrupt, ifconfig eth2 mtu 9000 resolved the issue. It needs to set on both hosts peer, if they are not directly connected, make sure the switch supports jumbo frames. 
You don't need to care Interrupt Coalescence if your CPU resource is abundant, But for high load NFS/CIFS/ISCSI/ NAS servers, it is very useful.

3 comments:

  1. What about the case when %sys is high and the others not(except %user which is also high in my case)?

    ReplyDelete
    Replies
    1. +1
      mpstat 2
      02:43:19 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
      02:43:21 PM all 4.24 0.00 50.12 0.50 0.00 0.00 0.00 45.14 1050.99
      02:43:23 PM all 4.24 0.00 49.63 0.00 0.00 0.00 0.00 46.13 1089.45
      02:43:25 PM all 4.51 0.00 49.62 0.50 0.00 0.00 0.00 45.36 1066.00

      Delete