Friday, July 16, 2010

Debugging issues with strace in Linux.

strace runs the specified command until it exits.  It intercepts and records the system calls. -T option shows the time spent in system calls. It is particularly useful to troubleshoot slow response issues, because you can pinpoint the step taking the longest time. (-r option  Print  a  relative timestamp upon entry to each system call, which is actually  the time spent in last system call, It is easier to read than –T output because it is displayed in first column)

#The following Telnet command took 20 secs to response, Was it issue with DNS or web server?  (It is an obvious DNS issue, but I just want to demonstrate how can strace pinpoint the issue.)
$time strace -f -F -i -r -t -T -v -o /tmp/trace.log telnet  www.google.com 80
telnet: could not resolve www.google.com/80: Name or service not known
real    0m20.080s
user    0m0.010s
sys     0m0.050s
#List line number and time spent by sorting time

$awk '{ print "LINE#"NR, $1}'  /tmp/trace.log | sort -nk2 | tail -5
LINE#55 0.010000
LINE#140 5.000075
LINE#154 5.000075
LINE#136 5.000076
LINE#150 5.000076
 #Print out the lines in question. it is clear that DNS timed out on waiting response from DNS server 100.0.0.23, it tried four times(the remaining 3 timeout were not included here) each time took 5 secs.
$awk '{  if ( NR > 125  &&  NR <= 136 ) {print "LINE#"NR, $0 } }' /tmp/trace.log
LINE#126      0.000000 [b7e601d1] stat64("/etc/resolv.conf", {st_dev=makedev(117, 0), st_ino=50235, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=83, st_atime=2010/07/16-09:47:25, st_mtime=2010/07/16-09:45:02, st_ctime=2010/07/16-09:45:02}) = 0 <0.000000>
LINE#127      0.000000 [b7e2a0f1] gettimeofday({1279237645, 625155}, NULL) = 0 <0.000000>
LINE#128      0.000000 [b7e72402] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 <0.000000>
LINE#129      0.000000 [b7e71f0c] connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("100.0.2.3")}, 28) = 0 <0.000000>
LINE#130      0.000000 [b7e61e88] fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR) <0.000000>
LINE#131      0.000000 [b7e61e88] fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000000>
LINE#132      0.000000 [b7e2a0f1] gettimeofday({1279237645, 625155}, NULL) = 0 <0.000000>
LINE#133      0.000000 [b7e67296] poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) <0.000000>
LINE#134      0.000000 [b7e7220c] send(4, "B\262\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1"..., 32, MSG_NOSIGNAL) = 32 <0.000000>
LINE#135      0.000000 [b7e67296] poll([{fd=4, events=POLLIN}], 1, 5000) = 0 (Timeout) <5.000076>
LINE#136      5.000076 [b7e2a0f1] gettimeofday({1279237650, 625231}, NULL) = 0 <0.000000>

No comments:

Post a Comment