Constantly folded

I was poking around with the disassembler the other day, which is never a good idea. One of the things I looked at was the following bit in the RX hotpath for ath5k:

rxs->qual = rs.rs_rssi * 100 / 35;

Now, that’s not a very significant calculation, and I doubt it will show up in profiles, but divides automatically trigger my “can we do better?” reflex. I was interested to see how gcc compiled this, because we have division by a constant, but we first have to multiply by a variable due to order of operations.

We can generally remove a division by multiplying by its reciprocal. Interestingly, gcc already does that. I couldn’t quite puzzle out all of the steps, but here’s the disassembly with my best guess:

; multiply eax by 100, store in ecx
 80483d0:	6b c8 64             	imul   $0x64,%eax,%ecx

; load (32 / 35) * 2**32 into edx
 80483d3:	ba eb a0 0e ea       	mov    $0xea0ea0eb,%edx

; multiply 100 * argc by 32/35 and store in edx:eax
 80483d8:	89 c8                	mov    %ecx,%eax
 80483da:	f7 ea                	imul   %edx

; take top 32 bits of result in edx (when sign extended, the
; complement of the final answer?) and add it back to the numerator
 80483dc:	8d 04 0a             	lea    (%edx,%ecx,1),%eax
 80483df:	89 c2                	mov    %eax,%edx

; divide by 32 to remove pre-multiply
 80483e1:	c1 fa 05             	sar    $0x5,%edx

; subtract one if we need to round
 80483e4:	89 c8                	mov    %ecx,%eax
 80483e6:	c1 f8 1f             	sar    $0x1f,%eax
 80483e9:	29 c2                	sub    %eax,%edx


So then I went hunting for the constant folding code in gcc, and there are all kinds of tricks like this. Very neat. Along the way I also found a link to the book Hacker’s Delight, now wish-listed.

In the original code, the denominator is a bit arbitrary, we could pick a different number that is more amenable to shifts and adds and save a multiply, but it’s hardly worth it.

wl1251 performance

After fixing the remaining ifup bug (as expected, it was easy), I have some initial numbers on the new kernel driver versus the stock vendor driver on the G1:

driver avg ping ms netperf mbit/s
tiwlan 65.231 7.53
wl1251 8.565 3.82

So, better on latency, worse on throughput. wl1251 is also quite a lot larger when taking all of cfg80211/mac80211 into account, though I didn’t spend any time trying to tweak the size in the build. Well, at least the code doesn’t make you want to poke your own eyes out.

Hang fixed

These sorts of bugs can ruin your weekend, just ask Ange who had to listen to me mope yesterday. Of course, I spotted it right away when looking at it freshly during this morning’s commute. Now, ifconfig wlan0 up; ifconfig wlan0 down; ifconfig wlan0 up still fails with wl1251, but it doesn’t hang and the rest looks tractable.

wl1251_sdio merged

The SDIO patches for TI 1251 (Android wifi chipset) are finally merged into wireless-testing, so they should be a lot easier to hack on now. That means the driver should make it into 2.6.32, though at a rather experimental stage. I did fix some crashes on ifup/ifdown since last posting, but there’s always more work to do. Current todo list includes better behavior for non-polling controllers (make the irq have a top-half), tracking down a device hang on reinitialization, pushing the msm_wifi.ko module, and on and on.

But I need to spend spare cycles on ath5k in the near term. John Linville recently remarked that he was sick of seeing bug reports that say “it works fine in madwifi,” and frankly, I agree. There’s little excuse for having a sub-standard driver given that we have had two fully open HALs for almost a year. Of course, that can be laid at my feet as much as anyone’s, so my plan is to install madwifi side-by-side with ath5k and do a lot of performance testing to see where we stand. ANI is the big missing feature; it will be useful to see how madwifi performs with and without it.

In other nerdy news, yesterday I scored a copy of Kernighan and Pike’s The Practice of Programming at the local used book store for $3. I’ve read the first five chapters so far. While I’ve been at this long enough to have already learned the book’s best practices (some the hard way), I really wish it was required reading at many of the places I’ve worked. You could do away with a lot of stupid coding standards documents by instead saying “read tpop, oh and please no studly caps.”