memory µ-optimizing

While looking at some kmalloc statistics for my little corner of wireless, I found a new tool for the toolbox: perf kmem. To test it out, I ran a simulated connected 25-node mesh in mac80211_hwsim.

Here’s what perf kmem stat --caller had to say:

---------------------------------------------------------------------
 Callsite            | Total_alloc/Per | Total_req/Per  | Hit   | ...
---------------------------------------------------------------------
 mesh_table_alloc+3e |      1600/32    |       400/8    |    50 |
 mesh_rmc_init+23    |    204800/8192  |    102600/4104 |    25 |
 mesh_path_new.const |    315904/512   |    241864/392  |   617 |
[...]

The number of mesh nodes shows up in the ‘Hit’ column in various ways: we can pretty easily see that mesh_table_alloc is called twice per mesh device, mesh_rmc_init is called once, and mesh_path_new is called at least once for every peer (617 ~= 25 * 24). We might scan the hit column to see if there are any cases in our code that perform unexpectedly high numbers of allocations, signifying a bug or poor design.

The first two columns are also interesting because they show potentially wasteful allocations. Taking mesh_table_alloc as an example:

(gdb) l *mesh_table_alloc+0x3e
0x749fe is in mesh_table_alloc (net/mac80211/mesh_pathtbl.c:66).
61		newtbl = kmalloc(sizeof(struct mesh_table), GFP_ATOMIC);
62		if (!newtbl)
63			return NULL;
64
65		newtbl->known_gates = kzalloc(sizeof(struct hlist_head), GFP_ATOMIC);
66		if (!newtbl->known_gates) {
67			kfree(newtbl);
68			return NULL;
69		}
70		INIT_HLIST_HEAD(newtbl->known_gates);

We are allocating the size of an hlist_head, which is one pointer. This is kind of unusual: typically the list pointer is embedded directly into the structure that holds it (here, struct mesh_table). The reason this was originally done is that the table pointer itself could be switched out using RCU, and a level of indirection for the gates list ensured that the same head pointer is seen by iterators with both the old and new table. With my rhashtable rework, this indirection is no longer necessary.

To track allocations with 8-byte granularity would impose an unacceptable level of overhead, so these 8-byte allocations all get rounded up to 32 bytes (first column). Because we allocated 50 of them (25 devices, two tables each), in all we asked for 400 bytes, and actually got 1600, a waste of 1200 bytes. Forty-eight wasted bytes per device is generally nothing worth worrying about, considering we usually only have one or two devices — but as this kmalloc is no longer needed, we can just drop it and also shed a bit of error-handling code.

In the case of mesh_rmc_init, we see an opportunity for tuning. The requested allocation is the unfortunate size of 4104 bytes, just a bit larger than the page size, 4096. Allocations of a page (so-called order-0 allocations) are much easier to fulfill than the order-1 allocations that actually happen here. There are a few options here: we could remove or move the u32 in this struct that pushes it over the page size, or reduce the number of buckets in the hashtable, or switch the internal lists to hlists in order to get this to fit into 4k, and thus free up a page per device. The latter is the simplest approach, and the one I chose.

I have patches for these and a few other minor things in my local tree and will send them out soon.

HSTS applied

Now that I’ve had a few months’ experience with Let’s Encrypt, and I have cert renewals completely automated, bobcopeland.com is now fully https and serving HSTS headers. I’m now using acme-tiny to renew the certs, thanks to their dedication to making the code easily auditable.

On mac vi and error exits

Sometimes I have to use a Mac, and various things drive me nuts about it. The worst is git commit aborting because I typoed the writeout command after having written the best commit log ever. Here’s a google crumb for others experiencing this, and ideally for someone who wants to track down and fix the problem upstream.

Synopsis: do the below, and type the following once vim starts up: :Wq[enter]:q[enter]

$ vi || echo broken
broken
$ vim || echo broken
$ vim --version
VIM - Vi IMproved 7.4 (2010 Aug 15, compiled Jul  9 2015 23:58:42)
Compiled by root@apple.com
[...]

/usr/bin/vi is of course a symlink to /usr/bin/vim that makes it act worse. Now, there are some folks who have various theories about why vi is different in this respect, or what vimrc stuff might cause vi to exit with an error if some error happened once upon a time in the editing session, but honestly I can’t find myself caring. Instead I just set my $EDITOR to pretend vi doesn’t exist anymore.

Carbs

My brother-in-law gave me a copy of the bread cookbook Flour Water Salt Yeast by Ken Forkish (as far as I know, his actual name) for Christmas, and so I made five pizzas. The one pictured below is a touch over-baked on top, but pretty close. It was a little too chewy owing to having used bread flour, yet had a nice airy crumb. I think this replaces Peter Reinhart’s recipe as my go-to from now on — both are about 70% hydration but this one had more bubbles.

The book has some tips for baking it in a home oven, which were pretty useful, though I found his suggestion of temporarily using the broiler caused my oven temp to drop 30 degrees in the process. Still working on the heat aspect, but I’m getting better at dealing with wet doughs.

Incidentally, this one is for the kids: my own favorite toppings, stolen from a pizza joint in Crystal City, VA, are prosciutto and arugula.

Homemade pizza

Also cooked last week: pot roast with homemade egg noodles, and butter chicken.

Homemade noodles

Whew, I think it’s going to be takeout all this week.

wherein I did a git push

As I wrote somewhere in the Metaverse, I’m running the wireless-testing tree now, which is about as minimal a contribution as one could make. It essentially means running a script once daily; cron could do it all, if not for the part where I enter my ssh passphrase to actually push out the updates. For the curious, here’s the script.

One of the killer features of git used here is git rerere, which remembers conflict resolutions and reapplies them when reencountered. A human (me) still needs to resolve the conflict once, but after that it’s pretty much on autopilot for the rest of that kernel series. Normally one should reinspect the automatic merge made by git-rerere, but at least for this testing tree, throwing caution to the wind is warranted and thus the script has some kludges around that.

Every pushed tag had a successful build test with whatever (not claimed to be exhaustive) kernel config I am using locally. I’ve considered some more extensive tests here, such as running the hostapd test suite (takes an hour with 8 VMs on my hardware) or boot testing on my mesh APs (requires a giant stack of out-of-tree patches), but, given the parentheticals, I’m keeping it simple for now.

Observant persons may have noted some missing dates in the tags. Generally, this means either nothing actually changed in the trees I’m pulling, or it is during the merge window when things are too in flux to be useful. Or maybe I forgot.

sae.py

For a while now, I’ve wanted an easy way to decrypt a mesh packet capture when I know the SAE passphrase. This would be quite handy for debugging.

I reasoned that, if I knew the shared secret, and had captured the over-the-air exchange, it should be possible to make such a tool, right? We know everything each station knows in that case, don’t we?

So I spent a bit of time last week reimplementing SAE in python to come to grips with its various steps. And I found my reasoning was flawed.

Similarly to Diffie-Hellman, SAE exchanges values that include the composition of a random value (we’ll call them r1 and r2) with some public part P, in such a way that it is hard to extract r1 even if you know P (e.g. r1*P and r2*P with ECC, although this is not exactly how SAE is specified). These values can be used by each peer to arrive at a shared secret provided they know the original random number (e.g., something like r1 * r2 * P). The crucial point is that r1 and r2 cannot be determined from the exchange alone. They exist only in memory on each peer during the exchange.

So if the secrecy doesn’t actually depend on the password, what is it there for? The answer is authentication: SAE is a password authenticated key exchange (PAKE). The password ensures that the party we are talking to is who we think it is, or at least someone who also knows the password. In the case of SAE, P is derived from the password.

As for the original goal, what we can do instead is use the CCMP keys from a wpa_supplicant log and decrypt frames with those. That is not nearly as convenient, though.

And thus my postings for 2015 are at an end; happy new year, all!

RFID reader

In the previous installment, I was messing with the serial library on Arduino to get it to work on the older board. The purpose was this:

Pictured is an RFID reader from SparkFun that I picked up years ago and sat in my parts box since then. This RFID reader is intended to read Mifare tags and as such predates the whole NFC thing by a year or three. The programming interface is to send commands over serial to the device such as “Authenticate sector N”, “Read sector N”, etc. I’ve connected D7 to D2 so that I could use D2 as the interrupt pin to suit my Arduino.

It appears to work, in that I can dump sectors from my yubikey and a tap-and-go credit card. The sector trailer parts look valid, and the first few bytes of sector 0 have the device ID (verified by the NFC reader on my phone). I can’t seem to read any NDEF information; whether that is due to limitations in the reader, bugs in my code, or my almost zero knowledge about how all this stuff works, I’m not sure. Sketch code is over here.

As to possible applications, for that I have no real idea either, which is probably why it sat in the parts bin all this time.

Software Serial on the Arduino NG

As mentioned before, I have a really old Arduino, the Arduino NG with an Atmega8 on board. No fancy embedded Linux wifi-enabled SoC on that thing.

Recently, I found myself doing a bit of yak shaving when I was trying to use a code that was meant for Arduinos of a more recent vintage. And here is what I learned:

The Atmega8 does not have the PCICR (which I believe stands for “pin change interrupt control”) register. This feature allows multiplexing of interrupts so that you can have multiple pins signal the same external interrupt, and the pins that trigger interrupts are programmable. On the Atmega8, you can only use Arduino pins D2 and D3 to directly signal interrupts 0 and 1 respectively.

So the device I was trying to use operates over serial, and the supplied code uses the built-in SoftwareSerial bit-banging serial library to talk to it.
The SoftwareSerial library attaches an interrupt to the configured Arduino RX pin (device TX) to trigger when it goes low, indicating the start of a transmission from the device. This library was only written with the more configurable chips in mind, so didn’t support direct interrupts like on the Atmega8. Google didn’t turn up any obvious fixes other than people running into the same compiler error I initially hit:

error: 'PCICR' was not declared in this scope.

In the end a few short changes are needed to make it work: one to disable digitalPinToPCICR() and related macros (as there is no such register on this device); and another change to attach the proper interrupt inside SoftwareSerial. A patch is here.

To debug this, I hooked up the TX side of a USB-serial adapter to the RX pin on the Arduino, ran screen(1) on the computer to control it, and started mashing keys. Until I got the irq triggering working properly, I also monitored the output with a DMM just to make sure the output was going low when I held down a key. A test sketch for that looks like this:

SoftwareSerial port2(2, 3);

void setup()
{
  Serial.begin(9600);
  port2.begin(9600);
  Serial.println("Listening...");
}

void loop()
{
  while (port2.available()) {
    Serial.write(port2.read());
  }
}

It works fine. I’ll write about what I was doing with it in some future installment.

Now (mostly) encrypted

I signed up for the let’s encrypt beta and got my cert setup today. I let my old CA-purchased cert lapse when it came for re-up a couple of years ago, so happy there’s soon to be a free alternative.

Enjoy the HTTPS! (Except for some stray mixed content I need to kill — sorry for that.)

Arduino pumpkin lights

Lacking creativity this year, I kind of phoned in the pumpkin carving. We went with regular old triangle-eyed Jack-o-lanterns. It turned out this suited Alex fine: he has decided lately that a certain level of traditionalism is called for. Watching keenly as I cut away bits of pumpkin, he was quick to point out whenever I was doing it the wrong way.

Anyhow, not wanting Halloween to pass buy without doing something new, I decided I’d write a pumpkin light control sketch for Arduino. I have the original board with the Atmega8, and the last time I touched it, the SDK was arduino-0011, which was released about seven years, three domiciles, and one country ago. Since then, I’m happy to see that arduino-mk landed in Debian, and the SDK hasn’t changed APIs in any noticeable way, so it was easy to get back up and running.

The concept is pretty simple: write PWM patterns to an output pin to control the brightness or on/off state of the pumpkin light. I came up with three patterns: fade up and down; random toggle; and sequential stepping. The sketch code is over here.

Arduino light controllerOn the hardware side, I used 3 LEDs for each pumpkin, snipped from a roll that I had left over from my cabinet lighting project. The LEDs expect a +12V supply. On the spool, they are wired in series with a 150 ohm resistor and each LED has a 2.7V voltage drop. Powering that would stretch the Arduino, so I used a separate 12V power supply for those, and used the Arduino to switch an NPN transistor corresponding to each light. I soldered the LEDs to some speaker wire and put the rest of the components on an Arduino protoshield inside a box. This was essentially a lunch hour’s worth of work to get the basic functionality going, and then a little more time to make it neat.

This year Alex and Ian went as Bumblebee and Iron Man respectively (-ENOTPICTURED). Sadly, with store bought costumes — homemade ones will have to wait until one of us learns to sew something other than kites.