Numerology

I’m technically on break from school now, although I’m still putting the final polish on one last project. But things have slowed down enough that I can bore you with technical crap again.

There’s an interesting section of TPOP wherein the authors discuss studying the “numerology” of a bug; that is, look for patterns in the data instead of just looking at the code. I’ve had a lot of practice with this idea lately when trying to decipher kernel crashes with nothing but an Oops message to go on (these being much easier than infrequent lockups with no output at all!). It’s quite useful for userspace as well, if you are using a language where memory corruption is a fact of life. To show why, here’s a lengthy anecdote about how I recently used the power of man ascii to find and fix a memory corruption bug in about five minutes while racing to meet a project deadline.

First, some background: in the kernel, linked lists are implemented by embedding the next and previous pointers directly into the linked object, by way of the struct list_node. For example:

struct list_node {
    struct list_node *next;
    struct list_node *prev;
};

/* one item in a list */
struct item {
    int id;
    struct list_node list;
};

/* some other object that has a list of items */
struct bar {
    struct list_node item_list_head;
};

This is a curious construction, but once you get used to it, it’s quite handy — you can embed an object in multiple simultaneous lists without having to worry about the lifetime of void pointers, and without having to malloc container structures all the time. The list code is object-agnostic, operating only on the list_node structures, but of course you need a way to get from a struct list_node back to the object that contains it. For that, there is the container_of macro and friends. These are just C macros that, given the address of the struct list_node, the variable name of the structure, and the type of the item in which the struct is embedded, finds the start of the containing structure with a little pointer arithmetic.

I had implemented a similar set of list routines for a userspace program and started getting weird crashes, so I fired up gdb to see something like the following:

gdb > p *item
$1 = {id = 134269966, item_list = {next = 0x804b008, prev = 0x804b008},
      global_list = {next = 0x72657375, prev = 0x40313030},
      src = "machine.org", '00' <repeats 69 times>}

Had I taken a minute to look at this in detail, I would have noticed that the value of id was quite higher than it should be, and that the leading part of src was truncated. But what jumped out at me immediately was that global_list.next and global_list.prev were obviously wrong pointer values. In Linux on x86, heap addresses are usually above 0x0804b000, and stack addresses are below 0xbfffc000. Clearly, the global_list pointers don’t meet these ranges. However, the item_list.next and item_list.prev addresses looked fine. So either the global list pointers weren’t being initialized, or they were being corrupted somehow. Sometimes that kind of corruption happens if the list pointers aren’t updated properly, causing the code to follow a next pointer into random memory, but then it would be unlikely that the item_list pointers and src had meaningful values. Plus, I unit-tested the list routines and was pretty confident they were working correctly.

My initial theory was that something was overwriting the global_list struct, and the byte values looked to be in the ascii range. So I turned to the ascii(7) man page, which gives you a nice little table of hex ordinal to character value mappings. Looking at the bytes in turn gives: 0x72 = ‘r’, 0x65 = ‘e’, 0x73 = ‘s’, 0x75 = ‘u’, 0x40 = ‘@’, 0x31 = ‘1’, 0x30 = ‘0’, 0x30 = ‘0’. Since x86 is little-endian, this works out to the string “user001@”, which should have been the first part of src. The bug must be where the code filled in the src string. However, that code looked fine. I then decided that the pointer to the entire structure must be offset somehow. Sure enough, when I looked at the surrounding code, I saw something like this:

    struct list_node *n;
    struct item *item;

    for (n = list_start(&item_list_head); n;
         n = list_next(&item_list_head, n))
    {
        item = container_of(n, struct item, global_list);
        /* strncpy and other stuff ... */
    }

Oops. I had passed the wrong list name to container_of(), with the effect that the item pointer was actually 8 bytes prior to the real start of the structure. As a result, the string copy overwrote the pointer values of just one of the list structures, a bug which didn’t actually have a negative effect until later.

There are morals a-plenty, including “macros are evil” and “this is why templates were invented.” Yet, I’ve seen Linus use this technique more than once when others were stuck on a kernel memory corruption bug, so I’ll go with “Kernighan and Pike are always right[1].”

[1] Except, I’m still not sold on using capital letters in Go. Perhaps in time I can be convinced.

job

I love getting stuff like this in my inbox:

Must have skills:
Expert in C or C++ programming
(Windows || Mac || Linux) system internals.
(Windows || Mac || Linux) kernel and device driver development
Experienced in assembly and debugging

Desired Skills: Experience writing/integrating MS SQL Stored
Procedures into C/C++ software. C (especially object oriented
like abstraction with pointers).

lolz.

Ath5k AP Mode

People seem to keep asking about this, despite there being a quality page on kernel.org on how to run an AP with any mac80211 driver. So for what it’s worth, here’s how my setup works. Note if you are seeing flakiness with certain clients, e.g. it works fine with a computer but not with your cell phone, it is likely there is some bug with the power saving handling. I’m currently working on a few such issues, so it may be fixed soon enough.

To get started, you need the following:

  • some sort of network uplink, like a wired ethernet port
  • a kernel with ap mode support for ath5k (2.6.31+) and netfilter (for NAT)
  • dnsmasq
  • hostapd

You have two basic options for interfacing the wireless network with your wired network. One is by using a bridge directly to the wired network. Another is to use NAT so the wireless network is on its own subnet. The former is more typical of an embedded device, but I prefer the latter on my home LAN, so that’s what I’ll describe.

Turn off any wireless daemon (such as NetworkManager) while you experiment with hostapd to ensure that nothing else has the device open.

The hostapd.conf is large but self-explanatory. One can get away with a rather small config file if using the defaults. This example sets up an AP on wlan0 with the SSID “hostapd_ath5k”. It has a wlan-facing IP address 192.168.10.1, it’s on channel 11, supports 802.11g, and has the WPA pre-shared key “my_password”. I have also set up EAP, it works but requires making a lot of certs and such, so Google is your friend here.

hostapd.conf:

interface=wlan0
driver=nl80211
ssid=hostapd_ath5k
hw_mode=g
channel=11
auth_algs=3
own_ip_addr=192.168.10.1
wpa=1
wpa_passphrase=my_password
wpa_key_mgmt=WPA-PSK
wpa_pairwise=CCMP

For DHCP and DNS forwarding, I use dnsmasq. This couldn’t be easier. Just put something like the following in /etc/dnsmasq.conf:

   dhcp-range=192.168.10.50,192.168.10.150,12h

This will hand out addresses in the 192.168.10.X subnet. Then I use a small script to enable IP masquerading before launching hostapd (note, it will flush all iptables rules, which may not be what you want, so use with caution).

run.sh:

#!/bin/sh

DEV=wlan0
GW_DEV=eth0

# set IPs
ifconfig $DEV down
ifconfig $DEV up
ifconfig $DEV 192.168.10.1

# setup ip masq
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -F
iptables -t nat -F
iptables -A FORWARD -i $DEV -j ACCEPT
iptables -A FORWARD -o $DEV -j ACCEPT
iptables -t nat -A POSTROUTING -o $GW_DEV -j MASQUERADE

./hostapd hostapd.conf

Running the script will launch hostapd, and if all goes well, you’ll see it show up in scans from other computers.

If anything goes wrong, make sure:

  • you’ve started dnsmasq
  • you have a valid backhaul connection
  • you aren’t using power save mode on client (iwconfig wlan0 power off — see previous comment about PS bugs)
  • you haven’t found a bug

me |= grad_school

I let the month of September elapse with nary a post. So in case you’re wondering, I’m still around and I just haven’t had time to return your emails yet. Instead, I’m busy getting destroyed by grad school, where I’ll be for the next seven {months/years} of my life.

PS Happy birthday, Karen!

Constantly folded

I was poking around with the disassembler the other day, which is never a good idea. One of the things I looked at was the following bit in the RX hotpath for ath5k:

rxs->qual = rs.rs_rssi * 100 / 35;

Now, that’s not a very significant calculation, and I doubt it will show up in profiles, but divides automatically trigger my “can we do better?” reflex. I was interested to see how gcc compiled this, because we have division by a constant, but we first have to multiply by a variable due to order of operations.

We can generally remove a division by multiplying by its reciprocal. Interestingly, gcc already does that. I couldn’t quite puzzle out all of the steps, but here’s the disassembly with my best guess:

; multiply eax by 100, store in ecx
 80483d0:	6b c8 64             	imul   $0x64,%eax,%ecx

; load (32 / 35) * 2**32 into edx
 80483d3:	ba eb a0 0e ea       	mov    $0xea0ea0eb,%edx

; multiply 100 * argc by 32/35 and store in edx:eax
 80483d8:	89 c8                	mov    %ecx,%eax
 80483da:	f7 ea                	imul   %edx

; take top 32 bits of result in edx (when sign extended, the
; complement of the final answer?) and add it back to the numerator
 80483dc:	8d 04 0a             	lea    (%edx,%ecx,1),%eax
 80483df:	89 c2                	mov    %eax,%edx

; divide by 32 to remove pre-multiply
 80483e1:	c1 fa 05             	sar    $0x5,%edx

; subtract one if we need to round
 80483e4:	89 c8                	mov    %ecx,%eax
 80483e6:	c1 f8 1f             	sar    $0x1f,%eax
 80483e9:	29 c2                	sub    %eax,%edx


So then I went hunting for the constant folding code in gcc, and there are all kinds of tricks like this. Very neat. Along the way I also found a link to the book Hacker’s Delight, now wish-listed.

In the original code, the denominator is a bit arbitrary, we could pick a different number that is more amenable to shifts and adds and save a multiply, but it’s hardly worth it.

wl1251 performance

After fixing the remaining ifup bug (as expected, it was easy), I have some initial numbers on the new kernel driver versus the stock vendor driver on the G1:

driver avg ping ms netperf mbit/s
tiwlan 65.231 7.53
wl1251 8.565 3.82

So, better on latency, worse on throughput. wl1251 is also quite a lot larger when taking all of cfg80211/mac80211 into account, though I didn’t spend any time trying to tweak the size in the build. Well, at least the code doesn’t make you want to poke your own eyes out.

Hang fixed

These sorts of bugs can ruin your weekend, just ask Ange who had to listen to me mope yesterday. Of course, I spotted it right away when looking at it freshly during this morning’s commute. Now, ifconfig wlan0 up; ifconfig wlan0 down; ifconfig wlan0 up still fails with wl1251, but it doesn’t hang and the rest looks tractable.

wl1251_sdio merged

The SDIO patches for TI 1251 (Android wifi chipset) are finally merged into wireless-testing, so they should be a lot easier to hack on now. That means the driver should make it into 2.6.32, though at a rather experimental stage. I did fix some crashes on ifup/ifdown since last posting, but there’s always more work to do. Current todo list includes better behavior for non-polling controllers (make the irq have a top-half), tracking down a device hang on reinitialization, pushing the msm_wifi.ko module, and on and on.

But I need to spend spare cycles on ath5k in the near term. John Linville recently remarked that he was sick of seeing bug reports that say “it works fine in madwifi,” and frankly, I agree. There’s little excuse for having a sub-standard driver given that we have had two fully open HALs for almost a year. Of course, that can be laid at my feet as much as anyone’s, so my plan is to install madwifi side-by-side with ath5k and do a lot of performance testing to see where we stand. ANI is the big missing feature; it will be useful to see how madwifi performs with and without it.

In other nerdy news, yesterday I scored a copy of Kernighan and Pike’s The Practice of Programming at the local used book store for $3. I’ve read the first five chapters so far. While I’ve been at this long enough to have already learned the book’s best practices (some the hard way), I really wish it was required reading at many of the places I’ve worked. You could do away with a lot of stupid coding standards documents by instead saying “read tpop, oh and please no studly caps.”

Sausage fest

5000 Calories of AwesomeWhenever I have some occasion at which I am to provide a side item or hors d’Å“uvres, I consider making the quintessential party treat: sausage balls. I think these are a Southern US mainstay, as no one around here seems to have encountered them before, whereas in my youth they made an all-too-brief appearance at many a gathering. Take it from me and my waistline, these things are full of win.

The classic recipe goes like this:

2 c Bisquick brand baking mix
10 oz medium chedder cheese, grated*
1 lb breakfast sausage**

* Sharp tends to be too dry. I also freshly grate it with the food processor since block cheese usually has more moisture than packaged grated cheese.
** Jimmy Dean “Hot”. Spicier is better.

Preheat oven to 350°. Mix everything together with your hands until it forms one big lump. Roll into 1.5-2″ diameter balls. Place on a cookie sheet and bake for 20 minutes or until golden brown. Cool before serving.

It doesn’t get much easier than that. However, I’m not crazy about relying on Bisquick; if we ever buy it, it stays in our fridge forever. I make biscuits frequently, but I find Bisquick biscuits to have a very chemical, baking soda taste that reminds me of pancakes. Maybe because I’m too lazy to make pancakes from scratch, so if I ever do, I use Bisquick — guess how often we have pancakes.

Anyway, I always have the components for home-made biscuits on hand, so I thought I would make an extra batch without Bisquick and see how it went. I took my normal biscuit recipe (stolen from Alton Brown), replaced the butter with shortening, as I thought the butter would make it too greasy, and supplanted the Bisquick:

2 c flour
4 tsp baking powder
3/4 tsp salt
1/4 tsp baking soda
4 tbsp buttermilk mix (my one cheat with biscuits, but it works great)
4 tbsp vegetable shortening
1.5 tbsp water
[rest same as above]

As indicated, I added a little water, because I found the Bisquick mixture to be a bit more moist than the flour mixture (I guess it has a little more fat in it). If the mixture is too dry, the ingredients will crumble rather than form a cohesive whole.

The result? The cheese and sausage really overwhelm the flavor of the dough, so there’s not much difference in taste. In texture, the Bisquick balls are chewier, while the from-scratch version is a bit lighter. It’s rather subtle, so while I somewhat prefer the latter, it’s not generally worth the extra effort. But since it saves me from buying biscuit mix, it’s a winner in my book.

Next time, I’ll try using chorizo.

Now bionic

Kalle has posted a rebased version of my SDIO patches on top of the latest wireless-testing. I’ve done a little bit of testing with it — the driver starts up and loads fine, but once the interface goes down, it’s busted:


<6>[ 980.884302] wl1251_sdio mmc0:0001:1: firmware: requesting wl1251-fw.bin
<6>[ 981.036448] wl1251_sdio mmc0:0001:1: firmware: requesting wl1251-nvs.bin
<3>[ 981.045449] init: untracked pid 354 exited
<3>[ 981.074408] init: untracked pid 355 exited
<7>[ 981.331853] wl1251: 151 tx blocks at 0x3b788, 35 rx blocks at 0x3a780
<7>[ 981.339300] wl1251: firmware booted (Rev 4.0.4.3.5)
<7>[ 984.702740] wlan0: direct probe to AP 00:1a:70:da:a9:cd (try 1)
<7>[ 984.901621] wlan0: direct probe to AP 00:1a:70:da:a9:cd (try 2)
<7>[ 985.101583] wlan0: direct probe to AP 00:1a:70:da:a9:cd (try 3)
<7>[ 985.301617] wlan0: direct probe to AP 00:1a:70:da:a9:cd timed out
<7>[ 1024.802837] wl1251: down
<3>[ 1044.856640] init: untracked pid 358 exited
<3>[ 1050.033886] mmc0: Data timeout
<3>[ 1050.037701] wl1251: ERROR sdio write failed (-110)
<3>[ 1051.045674] mmc0: Data timeout
<3>[ 1051.049519] wl1251: ERROR sdio write failed (-110)
<3>[ 1052.057673] mmc0: Data timeout
<3>[ 1052.061518] wl1251: ERROR sdio write failed (-110)

Exploring this further turned into a rather long yak shaving session because I wanted to get iw on the phone to do more manual tests, and I wanted to link it dynamically against bionic, the Android’s fork of a BSD libc. Here were a few interesting discoveries found along the way:

$ ./iw
iw: not found

Right away, I intuited this was a problem finding libnl.so, but I was setting LD_LIBRARY_PATH. So I looked at the bionic linker source to find:


/* TODO: Need to add support for initializing the so search path with
* LD_LIBRARY_PATH env variable for non-setuid programs. */

So libnl.so had to go into the hard-coded library search path of /system/lib. That alone didn’t help, however. I eventually remembered that dynamically linked executables include the name of the dynamic linker in ELF headers, and on Android this is called “/system/bin/linker” instead of “/usr/lib/ld.so.1.” After fixing that (-Wl,-dynamic-linker,/system/bin/linker), the program just SEGVed at startup, so I added Android linker scripts and other random crap to the linker command line (much of it furnished by the very cool agcc wrapper script).


# ./iw
Usage: ./iw [options] command
Options:
--debug enable netlink debugging
--version show version (0.9.14-2-g5286851)
Commands:
help

Almost there, but no available commands? A glance at the iw source revealed that iw sticks all that stuff into an ELF section which mostly disappears when you link with –gc-sections. So with that exorcised from my linker command line, I finally have a functioning Android iw that is dynamically linked against bionic libc and my own cross-compiled libnl.

One wonders if the pain of bionic is worth its benefits, unless a benefit was curing ennui: “I’m bored, let’s write our own libc!” Anyway, I should be able to produce the various wireless tools natively compiled for Android in short order now. After that I’ll pop my Android TODO stack to the task, “make it easier to build custom images in my build environment.” I’d like to customize the filesystem images to have all of these utilities and relevant drivers in the normal places rather than hanging out in weird locations on the sdcard.