A Linux server has high %system CPU usage, following are steps to find the root cause of the issue and how to resolve it.
Vmstat show %system CPU usage is high.# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 191420 8688 35780 0 0 0 0 1006 31 1 4 96 0 0 1 0 0 124468 9208 98020 0 0 15626 2074 1195 188 0 76 0 24 0 0 1 0 110716 9316 110996 0 0 3268 4144 1366 84 0 94 0 7 0 0 3 0 97048 9416 122272 0 0 2818 11855 1314 109 1 80 0 20 0 0 4 0 80476 9544 137888 0 0 3908 2786 1272 172 0 54 0 46 0 2 1 0 72860 9612 145848 0 0 1930 0 1193 141 0 42 0 58 0 0 1 0 74300 9620 145860 0 0 0 6 1208 67 0 38 0 62 0 0 0 0 75680 9620 145860 0 0 0 6929 1364 101 0 70 6 24 0
Let’s run mpstat to show more detailed CPU usage,it showed CPU was busy with interruptions.
# mpstat 2 Linux 2.6.18-92.el5 (centos-ks) 01/14/2010 02:03:50 AM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 02:04:04 AM all 1.33 0.00 41.78 0.00 0.44 3.56 0.00 52.89 1015.56 02:04:06 AM all 0.00 0.00 8.04 38.69 29.65 23.62 0.00 0.00 1326.63 02:04:08 AM all 0.00 0.00 8.70 30.43 27.54 28.50 0.00 4.83 1327.54 02:04:10 AM all 0.00 0.00 5.47 46.77 27.36 20.40 0.00 0.00 1280.10 02:04:12 AM all 0.50 0.00 6.47 63.18 19.40 10.45 0.00 0.00 1183.08 02:04:14 AM all 1.01 0.00 6.53 62.31 21.11 9.05 0.00 0.00 1190.95 02:04:16 AM all 0.00 0.00 8.04 26.63 43.72 21.61 0.00 0.00 1365.83 02:04:18 AM all 0.00 0.00 1.50 0.00 0.00 0.50 0.00 98.00 1006.50Use sar to find out which interrupt number was culprit. #9 was the highest excluding system interrupt #0.
# sar -I XALL 2 10
02:07:10 AM INTR intr/s
02:07:12 AM 0 992.57
02:07:12 AM 1 0.00
02:07:12 AM 2 0.00
02:07:12 AM 3 0.00
02:07:12 AM 4 0.00
02:07:12 AM 5 0.00
02:07:12 AM 6 0.00
02:07:12 AM 7 0.00
02:07:12 AM 8 0.00
02:07:12 AM 9 350.50
[ Solaris equivalent command] Solaris# intrstat 2 device | cpu0 %tim cpu1 %tim -------------+------------------------------ bge#0 | 0 0.0 128 0.6 cpqary3#0 | 0 0.0 14 0.0# cat /proc/interrupts
CPU0
0: 702980 XT-PIC timer
1: 439 XT-PIC i8042
2: 0 XT-PIC cascade
6: 2 XT-PIC floppy
8: 1 XT-PIC rtc
9: 14464 XT-PIC acpi, eth2
11: 12 XT-PIC eth0
12: 400 XT-PIC i8042
14: 6091 XT-PIC ide0
15: 22 XT-PIC ide1
NMI: 0
LOC: 700623
ERR: 0
MIS: 0
[ OpenSolaris equivalent command ]Solaris#echo ::interrupts | mdb –k Native Solaris has to search the interrupt from output of prtconf -vSolution:
When the card transmits or receives a frame, the system must be notified of the event. If the card interrupts the system for each transmitted and received frame, the result is a high degree of processor overhead. To prevent that, Gigabit Ethernet provides a feature called Interrupt Coalescence. Effective use of this feature can reduce system overhead and improve performance.
Interrupt Coalescence essentially means that the card interrupts the system after sending or receiving batch of frames.
you can enable adaptive moderation ( Adaptive RX: off TX: off) to let system choose value automatically or set individual values manually.
A interrupt is generated by the card to the host when either frame counter or timer counter is met. Values 0 means disabled.
RX for example:
Timer counter in microseconds: rx-usecs/rx-usecs-irq
Frames counter:rx-frames/rx-frames-irq
# A sample output with default values. # ethtool -c eth1 Coalesce parameters for eth1: Adaptive RX: off TX: off stats-block-usecs: 999936 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 18 rx-frames: 6 rx-usecs-irq: 18 rx-frames-irq: 6 tx-usecs: 80 tx-frames: 20 tx-usecs-irq: 80 tx-frames-irq: 20 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 <>[ Solaris equivalent command]
Varying on driver, Find out the driver's capability. Solaris#ndd -get /dev/e1000g0 \? | egrep ‘interrupt |intr’ The Value shoud be set in driver conf file: Solaris#/platform/`uname -m`/kernel/drv/*.confAlternative Workaround:
I couldn't config Interrupt Coalescence because virtual machine NIC didn't support it. but as workaround, Increasing mtu can also decrease interrupt, ifconfig eth2 mtu 9000 resolved the issue. It needs to set on both hosts peer, if they are not directly connected, make sure the switch supports jumbo frames. You don't need to care Interrupt Coalescence if your CPU resource is abundant, But for high load NFS/CIFS/ISCSI/ NAS servers, it is very useful.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.