Oops

I am finally getting the hang of debugging kernel crashes. None too soon as I got my first OOPS report from the -rc kernel with OMFS, from a gentleman who is intentionally corrupting his FS (“fuzzing” in the infosec lingo). After a frustrating weekend in which I had inadvertantly fixed the bug but didn’t realize it because I was testing the wrong module, I can now claim success. One down, several more to go.

Detective work after the jump if you care for the nerdy stuff.
Oops report:


BUG: unable to handle kernel paging request at c978e004
IP: [(c032298e)] omfs_readdir+0x18e/0x32f
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[...]
EIP: 0060:[(c032298e)] EFLAGS: 00010287 CPU: 0
EIP is at omfs_readdir+0x18e/0x32f
EAX: c978d000 EBX: 00000000 ECX: cbfcfaf8 EDX: cb2cf100
ESI: 00001000 EDI: 00000800 EBP: cb2d3f68 ESP: cb2d3f0c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[...]
[(c018a820)] ? filldir64+0x0/0xcd
[(c018a9f2)] ? vfs_readdir+0x56/0x82
[(c018a820)] ? filldir64+0x0/0xcd
[(c018aa7c)] ? sys_getdents64+0x5e/0xa0
[(c01038bd)] ? sysenter_do_call+0x12/0x31
=======================
Code: 00 89 f0 89 f3 0f ac f8 14 81 e3 ff ff 0f 00 48 8d
14 c5 b8 01 00 00 89 45 cc 89 55 f0 e9 8c 01 00 00 8b 4d c8 8b 75 f0 8b
41 18 (8b) 54 30 04 8b 04 30 31 f6 89 5d dc 89 d1 8b 55 b8 0f c8 0f c9

First step is to look at the faulting instruction. Running the “Code:” part through ~/linux/scripts/decodecode yields the disassembly:


8b 4d c8             	mov    -0x38(%ebp),%ecx
8b 75 f0             	mov    -0x10(%ebp),%esi
8b 41 18             	mov    0x18(%ecx),%eax
8b 54 30 04          	mov    0x4(%eax,%esi,1),%edx <=== here
8b 04 30             	mov    (%eax,%esi,1),%eax
31 f6                	xor    %esi,%esi

So the instruction is dereferencing the address [(eax+esi)*1+4]. From the register dump, EAX=c978d000. That looks like a pointer. ESI is 00001000, which is probably the index to an array. 0x1000 happens to be PAGE_SIZE which explains the page fault (kernel paging request) at the top of the oops.

Next, let’s look at the C code. There are two ways:


$ gdb omfs.ko
(gdb) l *(omfs_readdir+0x18e)

Or (and I find this a little more obvious since it has mixed C and assembly):


$ objdump -S omfs.ko > foo.S
# now look for instruction opcodes in foo.S: "8b 54 30 04"

From the output of the above commands, it’s apparent that the +4 index in the instruction comes from be64_to_cpu() converting a 64-bit big-endian number to little-endian. And we do that when reading directory pointers in omfs_readdir, specifically:


fsblock = be64_to_cpu(*((__be64 *) &bh->b_data[offset]));

EAX is bh->b_data so ESI must be offset. I happen to know it should never be above 2048, but it is 4096 in the register dump. Since the range is ultimately controlled by the directory inode size, I immediately suspected that that size got corrupted. For some reason I chased a bunch of other dead ends until I finally did look at the disk image and saw that the directory size was all wrong. Rule one of debugging: go with your gut.

Oh well. I guess all that assembly coding from years ago was useful after all.

Merged


$ git-log --author="Bob Copeland" v2.6.26..master  | git-shortlog

Bob Copeland (10):
ath5k: Fix loop variable initializations
ath5k: convert LED code to use mac80211 triggers
omfs: add filesystem documentation
omfs: define filesystem structures
omfs: add inode routines
omfs: add directory routines
omfs: add file routines
omfs: add bitmap routines
omfs: update kbuild to include OMFS
omfs: add MAINTAINERS entry

Woot! I had an 11th patch, for ath5k, but the maintainer fixed it independently. Very nice to finally get omfs in and not have to maintain that sucker out of tree.

Pegged

I moved my main server from the old house to the apartment this weekend, which immediately presented the problem of too many wires and not enough legroom under the desk. Taking a cue from lifehacker.com, I’m jumping on the pegboard organizer bandwagon. The picture to the right is the underside of the desk, with a section of pegboard attached via hanger bolts and wing nuts. Right now, only a couple of power strips and a router are zip-tied to it, but I plan to add the Vonage router, the wireless router, various power bricks, and a few loops for cable runs. Pro tip: affix the pegboard with temporary screws while you add the hanger bolts, or else the bolts might be just crooked enough to drive you nuts, so to speak.

Train haxored

I was wondering why the $7 trade-in limit for SmartTrip farecards went into effect, along with numerous signs imploring us to only buy cards from the vending machines. I thought maybe it was a MetroChek scam, but no, it was the result of scissors-and-tape hackers.

New Banshee Plugin

Thanks to a few hours of hacking during the holiday weekend, I have a new version of the Karma plugin for Banshee (and a newer version of omfs too). I’m too lazy to write my own release notes so read someone else’s! Thanks, Ben.

swing rant

Never ever write a Java gui application with the requirement that you check fields before letting the user tab off of them. What a nightmare! Your only two tools, without rewriting large parts of Swing yourself, are FocusListener and InputVerifier. FocusListener is great for the case when you have two such fields. Set up bad data in each field, then watch the focus traversal war as a focusLost() method reclaims the focus for one component, causing focusLost() in the other component to fire. Fun. Then you have InputVerifier, ostensibly designed for this very purpose. Ignoring the fact that buttons still fire without the verifier getting called, now you have the awesomeness of not knowing what the target component would be. Want to build a view with multiple fields that get validated as one? Good luck with that.

Recommend fail

Amazon:

We recommend: Pony 8510BP Cabinet Claw (2-Pack)

by Pony
http://www.amazon.com/dp/B0000224BN/ref=pe_ar_x1

List Price: $73.77
Price: $54.39
You Save: $19.38 (26%)

Recommended because you purchased or rated:
* Align-Rite DG-101 Drill Guide with 3/16-Inch Holes for 12-Inch Drawers and Doors


New business idea: a collaborative filtering engine that lets you, the user, customize the idiocy away. Actually, eTantrum’s music recommendation app (Linux version only) had sliders to control the weights of various things. Still an awesome idea.

Update: yeah, amazon lets you customize it too. I still want sliders though!

resize wtf

I get amused whenever I see another open-coded version of this in our codebase, a method for determining the resolution of a scaled-down image while maintaining aspect ratio:


while (x > targetx || y > targety) {
x *= 0.99;
y *= 0.99;
}

It’s pretty easy to discern that it will take a lot of iterations to accomplish this task. In fact I put it around 250 * log (x/x') where x’ is the target width. That’s maybe as many as 2000 FP multiplies depending on the difference in sizes between source and target image.

I guess computing the smallest scale factor and using it once was just too hard…

Meh

I’m playing with date conversions today, and again I’m struck by how much the Java Calendar should be held up as an example of the over-engineered API. Has anyone ever used anything besides the Gregorian calendar? They were so proud of it when it hit 1.1.

I should have two patches hitting kernel 2.6.26, one entirely cosmetic and one that fixes a real bug on Atheros wireless cards. Akpm did pick up the OMFS patchset so hopefully that will go in .27 timeframe, though the jury is still out on whether it hits mainline.

In other news, take that, Skype!

XMLization

The libpam-mount configuration file has changed to a new XML format.

Aaaaaaghhh, no!!!!