Oops

I am finally getting the hang of debugging kernel crashes. None too soon as I got my first OOPS report from the -rc kernel with OMFS, from a gentleman who is intentionally corrupting his FS (“fuzzing” in the infosec lingo). After a frustrating weekend in which I had inadvertantly fixed the bug but didn’t realize it because I was testing the wrong module, I can now claim success. One down, several more to go.

Detective work after the jump if you care for the nerdy stuff.
Oops report:


BUG: unable to handle kernel paging request at c978e004
IP: [(c032298e)] omfs_readdir+0x18e/0x32f
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[...]
EIP: 0060:[(c032298e)] EFLAGS: 00010287 CPU: 0
EIP is at omfs_readdir+0x18e/0x32f
EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100
ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[...]
[(c018a820)] ? filldir64+0x0/0xcd
[(c018a9f2)] ? vfs_readdir+0x56/0x82
[(c018a820)] ? filldir64+0x0/0xcd
[(c018aa7c)] ? sys_getdents64+0x5e/0xa0
[(c01038bd)] ? sysenter_do_call+0x12/0x31
=======================
Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d
14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b
41 18 (8b) 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9

First step is to look at the faulting instruction. Running the “Code:” part through ~/linux/scripts/decodecode yields the disassembly:


8b 4d c8             	mov    -0x38(%ebp),%ecx
8b 75 f0             	mov    -0x10(%ebp),%esi
8b 41 18             	mov    0x18(%ecx),%eax
8b 54 30 04          	mov    0x4(%eax,%esi,1),%edx <=== here
8b 04 30             	mov    (%eax,%esi,1),%eax
31 f6                	xor    %esi,%esi

So the instruction is dereferencing the address [(eax+esi)*1+4]. From the register dump, EAX=c978d000. That looks like a pointer. ESI is 00001000, which is probably the index to an array. 0x1000 happens to be PAGE_SIZE which explains the page fault (kernel paging request) at the top of the oops.

Next, let’s look at the C code. There are two ways:


$ gdb omfs.ko
(gdb) l *(omfs_readdir+0x18e)

Or (and I find this a little more obvious since it has mixed C and assembly):


$ objdump -S omfs.ko > foo.S
# now look for instruction opcodes in foo.S: "8b 54 30 04"

From the output of the above commands, it’s apparent that the +4 index in the instruction comes from be64_to_cpu() converting a 64-bit big-endian number to little-endian. And we do that when reading directory pointers in omfs_readdir, specifically:


fsblock = be64_to_cpu(*((__be64 *) &bh->b_data[offset]));

EAX is bh->b_data so ESI must be offset. I happen to know it should never be above 2048, but it is 4096 in the register dump. Since the range is ultimately controlled by the directory inode size, I immediately suspected that that size got corrupted. For some reason I chased a bunch of other dead ends until I finally did look at the disk image and saw that the directory size was all wrong. Rule one of debugging: go with your gut.

Oh well. I guess all that assembly coding from years ago was useful after all.

Employment

I found your resume on Dice and believe based on your background you would be a good fit for the Jr. Technical Project Manager/XML Position on a government project with one of our Fortune 100 partners in Crystal City, VA. This position is critical and we are seeking to fill thisimmediately.

Heh, I’m pretty sure this is my job, or at least the part of it that I have been hoping to pawn off on some Jr. Technical PM/XML type.

My lackadaisical job hunt continues, with the following results so far:

  • Motorola was promising, but ultimately it was too tech supporty and not programmy enough
  • No reply from the tech shop on the same street as my apartment
  • There are a boatload of J2EE/SQL jobs. Too bad that is boring as hell.
  • Too many defense/gov’t jobs. Ick. No, I do not want to be sponsored to work for NSA.
  • There aren’t that many “work from home hacking on the kernel” jobs.
  • Stop calling me, recruiters. Please let me ignore you via email.


I’m still being super-picky, obviously. Unless you have a “work from home hacking on the kernel” opening…

Book

Having seen it mentioned on Dave Jones’ blog (), I picked up The Soul of a New Machine and read it in 3 days. I think it pairs nicely with The Mythical Man Month as a cautionary tale for would-be computer nerds. Other than the amusing and all-too-true metaphor of ‘mushroom management,’ the interesting parts for me were the computer architecture digressions, with mostly reasonable layman’s explanations of i-cache, addresses, and the like. And the extensive discussion of Adventure — I had to break out TADS and try Colossal Cave myself. Doom was still better.

Merged


$ git-log --author="Bob Copeland" v2.6.26..master  | git-shortlog

Bob Copeland (10):
ath5k: Fix loop variable initializations
ath5k: convert LED code to use mac80211 triggers
omfs: add filesystem documentation
omfs: define filesystem structures
omfs: add inode routines
omfs: add directory routines
omfs: add file routines
omfs: add bitmap routines
omfs: update kbuild to include OMFS
omfs: add MAINTAINERS entry

Woot! I had an 11th patch, for ath5k, but the maintainer fixed it independently. Very nice to finally get omfs in and not have to maintain that sucker out of tree.

Pegged

I moved my main server from the old house to the apartment this weekend, which immediately presented the problem of too many wires and not enough legroom under the desk. Taking a cue from lifehacker.com, I’m jumping on the pegboard organizer bandwagon. The picture to the right is the underside of the desk, with a section of pegboard attached via hanger bolts and wing nuts. Right now, only a couple of power strips and a router are zip-tied to it, but I plan to add the Vonage router, the wireless router, various power bricks, and a few loops for cable runs. Pro tip: affix the pegboard with temporary screws while you add the hanger bolts, or else the bolts might be just crooked enough to drive you nuts, so to speak.

Train haxored

I was wondering why the $7 trade-in limit for SmartTrip farecards went into effect, along with numerous signs imploring us to only buy cards from the vending machines. I thought maybe it was a MetroChek scam, but no, it was the result of scissors-and-tape hackers.

Residences updated

My wife is officially a permanent US resident, yay!! Also, I am officially a Marylander. As I was sitting in the DMV the other day getting plates, they had this moronic LED ticker thing going. With earth-shattering up-to-the-minute news such as: “Did you know? The moon is the Earth’s only natural satellite. There is no life on the moon. The moon is the brightest object in the night sky — however it does not give off its own light. The moon reflects the sun’s light.”

Meanwhile, I’d like to take a second to thank the internets for finally updating neomail, and for fixing ‘In-Reply-To’, adding S/MIME, and being able to handle user-defined folders. Now I can realize my dream of replying to mailing list stuff (filtered by procmail into separate mboxes) from work without breaking the threading like a noob.

New Banshee Plugin

Thanks to a few hours of hacking during the holiday weekend, I have a new version of the Karma plugin for Banshee (and a newer version of omfs too). I’m too lazy to write my own release notes so read someone else’s! Thanks, Ben.