Oops

I am finally getting the hang of debugging kernel crashes. None too soon as I got my first OOPS report from the -rc kernel with OMFS, from a gentleman who is intentionally corrupting his FS (“fuzzing” in the infosec lingo). After a frustrating weekend in which I had inadvertantly fixed the bug but didn’t realize it because I was testing the wrong module, I can now claim success. One down, several more to go.

Detective work after the jump if you care for the nerdy stuff.
Oops report:


BUG: unable to handle kernel paging request at c978e004
IP: [(c032298e)] omfs_readdir+0x18e/0x32f
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[...]
EIP: 0060:[(c032298e)] EFLAGS: 00010287 CPU: 0
EIP is at omfs_readdir+0x18e/0x32f
EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100
ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[...]
[(c018a820)] ? filldir64+0x0/0xcd
[(c018a9f2)] ? vfs_readdir+0x56/0x82
[(c018a820)] ? filldir64+0x0/0xcd
[(c018aa7c)] ? sys_getdents64+0x5e/0xa0
[(c01038bd)] ? sysenter_do_call+0x12/0x31
=======================
Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d
14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b
41 18 (8b) 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9

First step is to look at the faulting instruction. Running the “Code:” part through ~/linux/scripts/decodecode yields the disassembly:


8b 4d c8             	mov    -0x38(%ebp),%ecx
8b 75 f0             	mov    -0x10(%ebp),%esi
8b 41 18             	mov    0x18(%ecx),%eax
8b 54 30 04          	mov    0x4(%eax,%esi,1),%edx <=== here
8b 04 30             	mov    (%eax,%esi,1),%eax
31 f6                	xor    %esi,%esi

So the instruction is dereferencing the address [(eax+esi)*1+4]. From the register dump, EAX=c978d000. That looks like a pointer. ESI is 00001000, which is probably the index to an array. 0x1000 happens to be PAGE_SIZE which explains the page fault (kernel paging request) at the top of the oops.

Next, let’s look at the C code. There are two ways:


$ gdb omfs.ko
(gdb) l *(omfs_readdir+0x18e)

Or (and I find this a little more obvious since it has mixed C and assembly):


$ objdump -S omfs.ko > foo.S
# now look for instruction opcodes in foo.S: "8b 54 30 04"

From the output of the above commands, it’s apparent that the +4 index in the instruction comes from be64_to_cpu() converting a 64-bit big-endian number to little-endian. And we do that when reading directory pointers in omfs_readdir, specifically:


fsblock = be64_to_cpu(*((__be64 *) &bh->b_data[offset]));

EAX is bh->b_data so ESI must be offset. I happen to know it should never be above 2048, but it is 4096 in the register dump. Since the range is ultimately controlled by the directory inode size, I immediately suspected that that size got corrupted. For some reason I chased a bunch of other dead ends until I finally did look at the disk image and saw that the directory size was all wrong. Rule one of debugging: go with your gut.

Oh well. I guess all that assembly coding from years ago was useful after all.