How to decode a kernel oops

When the board stops to respond due to one or another reason it is often something that has gone wrong with a userspace application. However, there are times when the kernel can not continue to execute. This can, for example, be the result of a faulty driver or numerous other reasons. When the kernel finds itself in this state it outputs information on the console port that can be used to track the origin of the problem which caused the kernel to stop. This information is often referred as “a kernel oops”.

A typical oops may look something like this:

Oops: bitten by watchdog
IRP: c00aa4f4 SRP: c00ab200 DCCR: 00000400 USP: 9ffffe84 MOF: 00000000
 r0: 00000013  r1: c0157284   r2: c0157284  r3: 00000013
 r4: c0170c5e  r5: c0398000   r6: 356204a8  r7: 00000000
 r8: c0157444  r9: 00001210  r10: 00000000 r11: 00000000
r12: c0170c5e r13: 00000013 oR10: 00000000
R_MMU_CAUSE: 355d102d
Process klogd (pid: 55, stackpage=c0c58000)

Stack from 9ffffe84:
       00081072 00081c58 0008181e 00000002 00080acc 35567388 9ffffed4 35632838 
       35582c3e 00000000 00000000 00000000 00000000 00000000 00080ecc 35566e14 
       00080ef8 000821c4 3555e9ac 9ffffed0 9fffff77 9fffff87 00000000 9fffff8a 
Call Trace: 
Stack from c0c59de0:
       c006f11c c0c59e18 c006d0a8 c006d204 00000000 00000000 c0170c5e 00000013 
       c0157284 00000000 c0c59e18 c006d2be 00000013 c006cf6a 00000000 00000000 
       00000013 c0170c5e 00000000 00000000 00001210 c0157444 00000000 356204a8 
Call Trace: [<c006f11c>] [<c006d0a8>] [<c006d204>] [<c006d2be>] [<c006cf6a>] [<c00ab200>] [<c00aa4f4>] 
       [<c00ab200>] [<c006f1ce>] [<c000821e>] [<c0008352>] [<c00085b8>] [<c000837a>] [<c0008514>] [<c00083e0>] 
       [<c00b200a>] [<c0005b54>] [<c0007fcc>] [<c00235c6>] [<c006cd66>] 
Code: c0 01 5f 9c ff 1f 5f 1d c0 01 a9 9a (5f) 1d c4 01 2d 9a 2f df ff 1f 00 00

The oops as such is not very useful, it has to be decoded first. To decode the oops the “ksymoops” program is used. It requires access to both the System.map and vmlinux files as well as the oops. Observe that you will have to use the System.map and vmlinux files associated with the kernel running on the target (the system that crashed), not the ones associated with you host. To decode the oops, run the following command in a shell on your Linux host:

ksymoops -K -L -O -m path/to/System.map -v path/to/vmlinux

This will start the ksymoops program which then will wait for you to copy paste the oops report. This will look something like this:

ksymoops 2.4.9 on i686 2.6.8-2-k7.  Options used
     -v os/linux/vmlinux (specified)
     -K (specified)
     -L (specified)
     -O (specified)
     -m os/linux/System.map (specified)

Reading Oops report from the terminal

Then you copy paste the oops and ksymoops will output something similar to this:

>>EIP; c00aa4f4 <rs_write+210/302>   <=====

>>IRP; c00aa4f4 <rs_write+210/302>
>>SRP; c00ab200 <rs_debug_write_function+48/7a>
>>IRP; c00aa4f4 <rs_write+210/302>
>>SRP; c00ab200 <rs_debug_write_function+48/7a>
>>r1; c0157284 <rs_table+0/a78>
>>r2; c0157284 <rs_table+0/a78>
>>r4; c0170c5e <log_buf+3f82/4000>
>>r5; c0398000 <_end+1821e0/dea1e0>
>>r8; c0157444 <rs_table+1c0/a78>
>>r12; c0170c5e <log_buf+3f82/4000>

Trace; c006f11c <raw_printk+0/74>
Trace; c006d0a8 <show_stack+0/88>
Trace; c006d204 <show_registers+d4/146>
Trace; c006d2be <watchdog_bite_hook+1a/1e>
Trace; c006cf6a <Watchdog_bite+1a/1c>
Trace; c00ab200 <rs_debug_write_function+48/7a>
Trace; c00aa4f4 <rs_write+210/302>
Trace; c00ab200 <rs_debug_write_function+48/7a>
Trace; c006f1ce <console_write+3e/64>
Trace; c000821e <__call_console_drivers+3c/48>
Trace; c0008352 <call_console_drivers+cc/f4>
Trace; c00085b8 <release_console_sem+38/90>
Trace; c000837a <emit_log_char+0/66>
Trace; c0008514 <printk+134/146>
Trace; c00083e0 <printk+0/146>
Trace; c00b200a <sock_sendmsg+70/88>
Trace; c0005b54 <schedule+5e/2b2>
Trace; c0007fcc <do_syslog+cc/2ac>
Trace; c00235c6 <sys_read+52/b4>
Trace; c006cd66 <system_call+50/58>

Code;  c00aa4e8 <rs_write+204/302>
00000000 <_EIP>:
Code;  c00aa4e8 <rs_write+204/302>
   0:   c0 01 5f                  rolb   $0x5f,(%ecx)
Code;  c00aa4eb <rs_write+207/302>
   3:   9c                        pushf  
Code;  c00aa4ec <rs_write+208/302>
   4:   ff 1f                     lcall  *(%edi)
Code;  c00aa4ee <rs_write+20a/302>
   6:   5f                        pop    %edi
Code;  c00aa4ef <rs_write+20b/302>
   7:   1d c0 01 a9 9a            sbb    $0x9aa901c0,%eax
Code;  c00aa4f4 <rs_write+210/302>   <=====
   c:   5f                        pop    %edi   <=====
Code;  c00aa4f5 <rs_write+211/302>
   d:   1d c4 01 2d 9a            sbb    $0x9a2d01c4,%eax
Code;  c00aa4fa <rs_write+216/302>
  12:   2f                        das    
Code;  c00aa4fb <rs_write+217/302>
  13:   df ff                     (bad)  
Code;  c00aa4fd <rs_write+219/302>
  15:   1f                        pop    %ds

This report tells us what was executed prior to the oops. In this case the rs_write (part of the serial.c driver) function appears to be the root of the problem.

 
oops.txt · Last modified: 2011/11/24 08:31 by admin
 
All text is available under the terms of the GNU Free Documentation License (see Copyrights for details).