Category Archives: computers


As a sometime wireless hacker, it’s a bit embarrassing to admit that I’ve had the factory firmware on my wifi router all this time, but when I first tried OpenWrt on it, ath9k was only a few months old and dropped connections all the time. Thus, I made do with the factory install but ran many of the essential services (dns, dhcp, tftp, etc) from my Linux workstation. And life continued apace.

After a recent network upgrade, I found I could no longer make my router understand ipv6, and so it was time to put the original firmware to pasture. In the intervening years, ath9k grew up, so I took another try with OpenWrt. The install took about 20 minutes, most of which was configuring the firewall and copying my existing dnsmasq config into a uci-friendly format. Everything works great and my ipv6 is back. Nice job, all involved!

I suppose I could now eat even more dogfood by running a mesh interface on one of the radios. In the past, I’ve tinkered with mesh as a wireless distribution system, but I don’t have much of a use for that currently with every room in the new place being wired. Perhaps my backyard could use expanded coverage?

vim cheat codes

Or, how I read parts of the fine manual.

Yesterday, after spending way too much time trying to get find(1) to exclude vim swapfiles, I finally had it with them cluttering up my work directories. As is usually the case when vim triggers an itch, I thought, “there must be a setting for that,” and lo, there was.

set directory=~/.vimswap//

Make that directory and all the swap files go there instead. The trailing double slashes mean the swap files get named in such a way as to avoid conflicts.

Here are some other things added to my vimrc over the last year or so.

Show whitespace issues as spelling mistakes:

match SpellBad /s+$| +zet/

I used to always set my windows to 80 columns, but then I started using ‘set number’ and then all of that went out the window, so to speak. So now I do this to show where wrapping needs to happen:

set colorcolumn=80

This hack is kind of neat, it shows +/- change markers on the edge, based on git changes in your working copy:

(To make it less intrusive, I added highlight clear SignColumn.)

Once upon a time, I had a really complicated macro to search up the directory hierarchy looking for tags files. It turns out vim already does that if you add a semicolon in there (:help file-searching):

set tags=./tags;

Phone bugs

If you want to experience what it is like to be the first person to ever do something, might I suggest turning on all the debug knobs on an Android vendor kernel? Some speculative fixes over here, but there is still at least one other RCU bug on boot, and this use-after-free in the usb controller driver:

[   34.520080] msm_hsic_host msm_hsic_host: remove, state 1
[   34.525329] usb usb1: USB disconnect, device number 1
[   34.529602] usb 1-1: USB disconnect, device number 2
[   34.637023] msm_hsic_host msm_hsic_host: USB bus 1 deregistered
[   34.668945] Unable to handle kernel paging request at virtual address aaaaaae6
[   34.675201] pgd = c0004000
[   34.678497] [aaaaaae6] *pgd=00000000
[   34.681762] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   34.686737] Modules linked in: wcn36xx_msm(O) wcn36xx(O) mac80211(O) cfg80211(O) compat(O)
[   34.694976] CPU: 1    Tainted: G        W  O  (3.4.0-g4a73a1d-00005-g311eaee-dirty #2)
[   34.702972] PC is at __gpio_get_value+0x28/0x1cc
[   34.707489] LR is at do_restart+0x24/0xd8
[   34.711486] pc : []    lr : []    psr: 60000013
[   34.711486] sp : ebcd9ed8  ip : c032a858  fp : ebcd9ef4
[   34.722930] r10: 00000000  r9 : c04dd67c  r8 : 00000000
[   34.728210] r7 : c4d81f00  r6 : 6b6b6b6b  r5 : c10099cc  r4 : aaaaaaaa
[   34.734680] r3 : 09090904  r2 : c1672f80  r1 : 00000010  r0 : 6b6b6b6b
                                                                 ^^^^^^^^ whoops
[   34.741241] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   34.748504] Control: 10c5787d  Table: acfb006a  DAC: 00000015

I guess I should try to find out where to report bugs for these (predictably) not upstream drivers, but that seems like a total pain.

Footprint, Part 2

My recent posting of ASCII art was intentionally subtle, perhaps overly so. If you haven’t figured it out by now, it is a C program that announces the birth of our second child. When compiled and executed, it prints:

Ian Yit-Sing Copeland
Born 5 August, 2013 at 02:49
Weight 6 lbs 11 oz

In this post, I will explain how I created it.

Like many a C practitioner, I’ve always found International Obfuscated C Code Contest entries to be at once entertaining and mystifying. While I don’t consider this effort to be near the level of IOCCC entries, I thought it might be fun to try my own IOCCC-lite ASCII art program.

I knew I wanted the program to simply print an announcement string, in a non-obvious fashion. It also had to be easy to modify the program with the actual details on the day of the delivery. Indeed, I wrote this program several weeks in advance and modified the output in only a few minutes.

Thanks to the Can You Crack It challenge, I already had an RC4-like algorithm sitting on my disk. The plan then was simple: embed the key and ciphertext in the program, and just decrypt and print it. Of course, the ciphertext would be all binary, which is hard to encode in a compact manner in the source. Thus, I decided to store the ciphertext in base32 encoding.

I actually didn’t put much effort into obfuscating the code; the main issue was getting the size of the source code into around 600 characters, and doing that involved using typedefs and one-letter variable names already. By the time that was done, I didn’t change much. Apart from variable naming, the main obfuscation step was to change the base32 mapping table to consist only of numbers and symbols so that the embedded string would itself look like code. The code otherwise is pretty straight-forward.

My starting point, which base32-decoded and decrypted some placeholder text, looked like this:

#include <string.h>
#include <stdio.h>

char str[] = "43(:7&!#3&@>$%^|]:&6;<7*-$}9{;!*!$5{<;-=^8]#:5<@#%#&!5!@40207#9($3&)7<$1";

int b32dec(char *in, char *out, int len)
    int i;
    char alpha[] = "3:5[9&2^]{7}*<-8@=4(6#!>|)0+;1$%";
    for (i=0; i < len; i++)
        unsigned char bits = strchr(alpha, in[i])-alpha;
        int j = (i / 8) * 5;
        switch (i % 8) {
            case 0:
                out[j + 0] |= bits << 3;
            case 1:
                out[j + 0] |= bits >> 2;
                out[j + 1] |= bits << 6;
            case 2:
                out[j + 1] |= bits << 1;
            case 3:
                out[j + 1] |= bits >> 4;
                out[j + 2] |= bits << 4;
            case 4:
                out[j + 2] |= bits >> 1;
                out[j + 3] |= bits << 7;
            case 5:
                out[j + 3] |= bits << 2;
            case 6:
                out[j + 3] |= bits >> 3;
                out[j + 4] |= bits << 5;
            case 7:
                out[j + 4] |= bits;
    return i/8 * 5;

int keysched(unsigned char *x, unsigned char *key, int keylen)
    int i;
    unsigned char tmp, a = 0;

    for (i=0; i < 256; i++)
        x[i] = i;

    for (i=0; i < 256; i++)
        a += x[i] + key[i % keylen];
        tmp = x[i];
        x[i] = x[a];
        x[a] = tmp;

int crypt(unsigned char *x, unsigned char *y, int len)
    unsigned char a;
    unsigned char b = 0;
    unsigned char tmp;
    int i;

    for (i=0; i < len; i++)
        a = i+1;
        b += x[a];
        tmp = x[a];
        x[a] = x[b];
        x[b] = tmp;
        y[i] ^= x[(x[a] + x[b]) & 0xff];

int main()
    unsigned char x[256];
    unsigned char key[] = "abcd";
    unsigned char crypt_text[sizeof(str)] = {};
    int len;

    len = b32dec(str, crypt_text, strlen(str));
    keysched(x, key, strlen(key));
    crypt(x, crypt_text, len);
    printf("%sn", crypt_text);

Micro-optimizing for source code size is unusual, and in some ways backwards to optimizing for generated code. For example, this change saved a couple of characters, but in the opposite way frequently done:

-            *o++ = d[0] << 3 | d[1] >> 2;
+            *o++ = d[0] * 8 | d[1] / 4;

Similarly, I found that combining unrelated functions was useful in eliminating the character waste of function definitions. There are likely a good deal more space-savers to be found; I quit when I got it small enough.

I experimented with a few different formats for the code. It turns out that I'm not terribly good at drawing ASCII. I abandoned a baby bottle as unrecognizable and went with a footprint after seeing this motif on some birth announcements. I hand-drew it, scanned it in, loaded as a background in an html document and then put dollar signs in the right places. Yes, cheating, but I am no artist.

                              $$$      $$$$$$$$$
                        $$   $$$$$    $$$$$$$$$
                       $$$$   $$$     $$$$$$$$$
                       $$$$             $$$$$$
                  $$   $$
                 $$$$            $$$$$$
             $  $$$$$       $$$$$$$$$$$$$
           $$$$  $$$$    $$$$$$$$$$$$$$$$$
           $$$$       $$$$$$$$$$$$$$$$$$$$$
           $$$$     $$$$$$$$$$$$$$$$$$$$$$$
            $$    $$$$$$$$$$$$$$$$$$$$$$$$$

I wrote a python script to encrypt and encode the string to be embedded in the code. A second script tokenized the original C source and replaced dollar signs from the text template with code. The latter script is not perfect, but worked well enough that I could hand-edit the output in a few seconds to get something reasonable.

The gory details are available at github.

Finally, on delivery day, I discovered that WordPress mangled C code included in posts. So I simply took a screenshot of the output and posted that instead, with a link to the original text. A picture of the newborn was hidden as a hyperlink from the footprint image.

Overall, the entire process took a couple of evenings from concept to completion, and I'm quite happy with the way it turned out. And, of course, I'm ecstatic about the inspiration: our newly arrived baby boy. We are very fortunate to have both Alex and now Ian in our lives, and silly C programs and fake man pages cannot come close to capturing our joy.

It’s faster because it’s async!

Some Javascript developers, when presented with an API, think, “I know, I’ll make it async!” Now their users have a giant mess of a state machine implemented with callbacks and recursion, in a GC runtime that doesn’t eliminate tail calls, and two orders of magnitude worse performance. Normally, I wouldn’t care, because you get what you ask for when you write something in Javascript, but this time it happened to a project that I occasionally maintain for $work. And so I was sad.

So, far from expert in the ways of JS, I looked for a way out of this mire, and stumbled across task.js, which is what the kids are doing these days. It is a neat little hack, although it only works if your JS engine supports generators (most do not). Still, it seemed like a reasonable approach for my problem, so I decided to understand how task.js works by deriving my own.

I should note that if one’s JS engine doesn’t have generators, one can kind of emulate them, in the same sense that one can emulate goto with switch statements. Consider this simple example:

var square_gen = function() {
    var obj = {
        state: 0,

        next: function() {
            var ret = this.state * this.state;
            return ret;

        send: function(v) {
            this.state = v;
    return obj;

var gen = square_gen();
for (var i=0; i < 4; i++) {
    console.log("The next square is: " +;
The next square is: 0
The next square is: 1
The next square is: 4
The next square is: 9

In other words, square_gen() returns an object that has a next() method which returns the next value, based on some internal state. The next() method could be a full-fledged state machine instead of a simple variable.

Certainly this is less tidy than the equivalent with language support:

var square_gen = function*() {
    var state = 0;
    while (1) {
        yield state * state;

(I'll assume the existence of generators in examples henceforth.)

The send() function is interesting -- it allows you to set the state from which the generator continues.

square_gen.send(16); /* => 256 */

The key idea is that generators will only do one iteration of a possibly large computation. When yield is reached, the function exits, and the computation is started up again after the yield statement when the caller performs another next() or a send().

Now, suppose you need to do something that takes a while, without waiting, like fetching a resource over the network. When the resource is available, you need to continue with that data. In the meantime you should try to do something else, like allow the UI to render itself.

The normal way one does this is by writing one's code in the (horrible) continuation-passing-style, which is fancy talk for passing callbacks to all functions. The task.js way is better: write functions that can block (generators) and switch to other tasks when they do block. No, I'm not making this up -- it is normal for a program in JS land to include a half-baked cooperative multitasker.

You can turn callbacks into promises, as in "I promise to give you value X when Y is done." Promise objects are getting baked into various runtimes, but rolling your own is also easy:

function MyPromise() {
    this.todo = function(){};

MyPromise.prototype.then = function(f) {
    this.todo = f;

MyPromise.prototype.resolve = function(value) {

That is: register a callback with then(), to be called when a value is known. The value is set by calling resolve().

Now, suppose you have an async library function that takes a callback:

function example(call_when_done) {
    /* do something asynchronously */

Instead of calling it like this:

    example(function(value) { /* my continuation */ }); can give it a callback that resolves a promise, which, when resolved, calls your continuation:

    var p = new MyPromise();
    example(function(value) { p.resolve(value); });
    p.then(function(value) { /* my continuation */ });

This is just obfuscation so far. The usefulness comes when you rewrite your functions to return promises, so that:

    function my_fn(params, callback) { /* pyramid of doom */ }


    function my_fn(params) { /* ... */ return promise; }

Now you can have generators yield promises, which will make them exit until another next() or send() call. And then you can arrange to call send() with the promise's value when it is resolved, which gives you blocking functionality.

    /* An async function that returns a promise */
    function async() {
        var p = new MyPromise();
        example(function(v) { p.resolve(v); });
        return p;

    /* A generator which blocks */
    var gen = function*() {

        /* does some async call and receives a promise */
        var p = async();

        /* blocks until async call is done */
        var v = yield p;

        /* now free to do something with v... */

     * Run one iteration of the generator.  If the
     * generator yields a promise, set up another
     * iteration when that promise is resolved.
    function iterate(gen, state) {
        var p = gen.send(state);
        p.then(function(v) {
            iterate(gen, v);

    /* Start a task */
    var p =;
    p.then(function(v) {
        iterate(gen, v);

     * You can do something else here, like start another
     * generator.  However, control needs to eventually
     * return to runtime main loop so that async() can
     * make progress.

A lot of details are omitted here like StopIteration and error handling, but that's the gist. Task.js generalizes a lot of this and has a decent interface, so, do that.

In summary, Javascript is terrible (only slightly less so without c-p-s), and I'm glad I only have to use it occasionally.

AWS precursor

Let’s say it’s 1975, and you have a mountain of data (2 megs) to process, and a heap of cash. Whom do you call to run your Cobol? Your local airline, of course!


IBM 360/195 for only 50 cents a SECOND

Guaranteed Turnaround! 2meg; 2314′s – 3330′s – 3420′s
Ans Cobol, Fortran G, G1, H, Assembler F & H, PL/1 F and PL/1 Optimizing and Checkout Compilers.

Our typical customer is knowledgeable in OS; has good working knowledge of JCL, Utilities and the functions of the compilers/assemblers he uses.
Call or Write

Courtesy of a yellowed copy of Computerworld my dad found among his things. I noticed Google has scanned a lot of old issues but I couldn’t find this one in their archive.

No, I don’t remember those days.

Grepping 300 gigs

It’s fun when a problem is simple, yet the program to solve it takes long enough to run that you can write a second, or third, much faster version while waiting on the first to complete. Today I had a 1.8 billion line, 300 GB sorted input file of the form “keytvalue”, and another file of 20k interesting keys. I wanted lines matching the keys sent to a third file, where the same key might appear multiple times in the input.

What doesn’t work: any version of grep -f, which I believe is something like O(N*M).

What could have worked: a one-off program to load the keys into a hash, and output the data in a single pass. Actually, I had written this one-off program before for a different task. In this case, the input file doesn’t need to be sorted, but the key set must be small and well-defined. It is O(N+M).

What worked: a series of sorted grep invocations (sgrep, but not the one in ubuntu). That is O(lgN * M), which can be a win if M is small compared to N. See also comm(1) and join(1), but I don’t know off the top of my head whether they start with a binary search. I had a fairly low selectivity in my input set so the binary search helps immensely.

What also could have worked: a streaming map-reduce or gnu parallel job (the input file was map-reduce output, but my cluster was busy at the time I was processing this, and I am lazy). That would still be O(N+M), but distributing across P machines in the cluster would reduce it by around the factor P. P would need to be large here to be competitive with the previous solution from a complexity standpoint.

Making up numbers, these take about 0, 5, 1, and 20 minutes to write, respectively.

Of course, algorithmic complexity isn’t the whole picture. The last option, or otherwise splitting the input/output across disks on a single machine, would have gone a long way to address this problem:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              10.50     0.00  361.00    9.75 83682.00  4992.00   478.35   139.42  174.20    6.43 6385.95   2.70 100.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

You can’t go any faster when blocked on I/O.

Note the similarities with join algorithms in databases — variations on nested loop, hash, and sort-merge joins are all present above.


So, I’m applying a one line patch to a Java package, and oh yes, it needs maven. Hooray!

$ apt-cache depends --recurse maven | grep Depends | 
     grep -v '<' | egrep lib.*java | sort | uniq | wc -l
[... start build and watch maven download another pile ...]
$ find ~/.m2 -name *pom | wc -l

I find it hard to believe that there are that many useful libraries in the world.

goto in bash

If you are as old and nerdy as I, you may have spent your grade school days hacking in the BASIC computer language. One of the (mostly hated) features of the (mostly hated) language was that any statement required a line number; this provided both the ability to edit individual lines of the program without a screen editor, as well as de facto labels for the (mostly hated) GOTO and GOSUB commands. But you could also use line numbers to run your program starting from any random point: “RUN 250″ might start in the middle of a program, typically after line 250 exited with some syntax error and was subsequently fixed.

Today, in bash, we have no such facility. Why on earth would anyone want it, with the presence of actual flow control constructs? Who knows, but asking Google about “bash goto” shows that I am not the first.

For my part, at $work, I have a particular script which takes several days to run, each part of which may take many hours, and, due to moon phases, may fail haphazardly. If a command fails, the state up to that point is preserved, so I just need to continue where that left off. Each major part of the job is already factored into individual scripts, so I could cut-and-paste commands from the failure point onward, but I’m lazy.

Thus, I present bash goto. It runs sed on itself to strip out any parts of the script that shouldn’t run, and then evals it all. Prepare to cringe.

# include this boilerplate
function jumpto
    cmd=$(sed -n "/$label:/{:a;n;p;ba};" $0 | grep -v ':$')
    eval "$cmd"


jumpto $start

# your script goes here...
jumpto foo

echo "This is not printed!"

echo x is $x

results in:

$ ./
x is 100
$ ./ foo
x is 10
$ ./ mid
This is not printed!
x is 101

My quest to make bash look like assembly language draws ever nearer to completion.

Fake Wireless Errors

When I did my previous work with mac0211_hwsim, I wrote the channel model in matlab and pre-generated a huge lookup table of frame error rates for different SNR values and transmission rates so that the simulator didn’t have to do any thinking for each packet. Obviously that’s a bit limiting and not in any way upstreamable in something like wmediumd.

So, I sat down today and rewrote it in C to see how bad the computation is. Actually, it’s not awful: I didn’t carefully benchmark it, but it sits at around 30 µsec per calculation, and there is probably a good deal of low hanging fruit such as making factorial() cache its computations, or fitting the output curves with cubics. I stuck the initial code on a wmediumd topic branch over here.

I verified the output matched my matlab code and charted it as above. Careful observers will note the 9 mbps rate is always worse than 12 mbps; this was a finding of D. Qiao and S. Choi in “Goodput enhancement of IEEE 802.11a wireless LAN via link adaptation,” in Proc. IEEE ICC01, from which most of my math was appropriated.