I am finally getting the hang of debugging kernel crashes. None too soon as I got my first OOPS report from the -rc kernel with OMFS, from a gentleman who is intentionally corrupting his FS (“fuzzing” in the infosec lingo). After a frustrating weekend in which I had inadvertantly fixed the bug but didn’t realize it because I was testing the wrong module, I can now claim success. One down, several more to go.
Detective work after the jump if you care for the nerdy stuff.
Oops report:
BUG: unable to handle kernel paging request at c978e004 IP: [(c032298e)] omfs_readdir+0x18e/0x32f Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC [...] EIP: 0060:[(c032298e)] EFLAGS: 00010287 CPU: 0 EIP is at omfs_readdir+0x18e/0x32f EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100 ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [...] [(c018a820)] ? filldir64+0x0/0xcd [(c018a9f2)] ? vfs_readdir+0x56/0x82 [(c018a820)] ? filldir64+0x0/0xcd [(c018aa7c)] ? sys_getdents64+0x5e/0xa0 [(c01038bd)] ? sysenter_do_call+0x12/0x31 ======================= Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d 14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b 41 18 (8b) 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9
First step is to look at the faulting instruction. Running the “Code:” part through ~/linux/scripts/decodecode yields the disassembly:
8b 4d c8 mov -0x38(%ebp),%ecx 8b 75 f0 mov -0x10(%ebp),%esi 8b 41 18 mov 0x18(%ecx),%eax 8b 54 30 04 mov 0x4(%eax,%esi,1),%edx <=== here 8b 04 30 mov (%eax,%esi,1),%eax 31 f6 xor %esi,%esi
So the instruction is dereferencing the address [(eax+esi)*1+4]. From the register dump, EAX=c978d000. That looks like a pointer. ESI is 00001000, which is probably the index to an array. 0x1000 happens to be PAGE_SIZE which explains the page fault (kernel paging request) at the top of the oops.
Next, let’s look at the C code. There are two ways:
$ gdb omfs.ko (gdb) l *(omfs_readdir+0x18e)
Or (and I find this a little more obvious since it has mixed C and assembly):
$ objdump -S omfs.ko > foo.S # now look for instruction opcodes in foo.S: "8b 54 30 04"
From the output of the above commands, it’s apparent that the +4 index in the instruction comes from be64_to_cpu() converting a 64-bit big-endian number to little-endian. And we do that when reading directory pointers in omfs_readdir, specifically:
fsblock = be64_to_cpu(*((__be64 *) &bh->b_data[offset]));
EAX is bh->b_data so ESI must be offset. I happen to know it should never be above 2048, but it is 4096 in the register dump. Since the range is ultimately controlled by the directory inode size, I immediately suspected that that size got corrupted. For some reason I chased a bunch of other dead ends until I finally did look at the disk image and saw that the directory size was all wrong. Rule one of debugging: go with your gut.
Oh well. I guess all that assembly coding from years ago was useful after all.