Updated to version 6

I bought a new wireless router the other day, a D-Link DIR-825 dual-band a/g/b/n. This is a step up in several regards from my previous 2.4 GHz 11g Linksys, not least of which is the D-Link’s ability to run OpenWRT with ath9k. Installing OpenWRT with prebuilt binaries is dirt simple; it took all of about 5 minutes to set up.

One of my motivations was to do some IPv6 testing in advance of World IPv6 Day. Comcast doesn’t yet offer IPv6 addresses to the public at large, so I set up a Hurricane Electric 6in4 tunnel. This is quite easy to configure in OpenWRT.

So now I can load ipv6.google.com in my browser. It’s approximately 2 better than the normal site.

On static in Java

In C, scoping rules aside, the following are pretty much the same:

static int x[] = { /* ... */ };
int foo()
{
    /* do something with x */
}

and

int bar()
{
    static int x[] = { /* ... */ };
    /* do something with x */
}

The static array x will go into the .data section of the executable. When the OS loads the executable into memory, it will ensure that the values for x are set in the appropriate place (e.g. by mapping its disk page to the virtual address in the ELF headers). In the second case, x is given some alias to avoid symbol conflicts.

There seems to be no way to do the second version in Java. Here’s how I tried:

static int xyzzy()
{
    final int x[] = { /* ... */ };
    /* do something with x */
}

This doesn’t do what you might think it does. While x cannot be changed, it is the reference to an array; the array elements themselves can be changed at will. Thus javac is unlikely to do anything smart here.

Indeed, in the above case, javac will initialize x with stack pushes every time xyzzy() is called. I had a real method like this, and making x a static member variable gave me an easy 2x speedup.

C++ error fail

Note to self: double-check the template arguments before asking the compiler to do so.

(WordPress hates to wrap, so I did it, badly.)

mpicxx -Wno-deprecated   -c -o pagerank.o pagerank.cc
In file included from /usr/include/boost/graph/distributed/adjacency_list
.hpp:3959,
                 from pagerank.cc:3:
/usr/include/boost/graph/distributed/adjlist/initialize.hpp: In member fu
nction ‘void boost::adjacency_list<OutEdgeListS, boost::distributedS<Pr
ocessGroup, InVertexListS, InDistribution>, DirectedS, VertexProperty, Ed
geProperty, GraphProperty, EdgeListS>::initialize(EdgeIterator, EdgeItera
tor, EdgePropertyIterator, typename boost::detail::parallel::adjacency_li
st_config<OutEdgeListS, ProcessGroup, InVertexListS, InDistribution, Dire
ctedS, VertexProperty, EdgeProperty, GraphProperty, EdgeListS>::inherited
::vertices_size_type, const typename boost::graph::distributed::select_di
stribution<InDistribution, VertexProperty, typename boost::detail::parall
el::adjacency_list_config<OutEdgeListS, ProcessGroup, InVertexListS, InDi
stribution, DirectedS, VertexProperty, EdgeProperty, GraphProperty, EdgeL
istS>::inherited::vertices_size_type, ProcessGroup, typename boost::graph
::internal_vertex_name<VertexProperty>::type>::type&, boost::vecS) [with
EdgeIterator = boost::graph::metis_reader::edge_iterator, EdgePropertyIte
rator = boost::graph::metis_reader::edge_weight_iterator, OutEdgeListS =
boost::vecS, ProcessGroup = boost::graph::distributed::mpi_process_group,
 InVertexListS = boost::vecS, InDistribution = boost::defaultS, DirectedS
 = boost::directedS, VertexProperty = boost::no_property, EdgeProperty =
boost::no_property, GraphProperty = boost::no_property, EdgeListS = boost
::listS]’:
/usr/include/boost/graph/distributed/adjacency_list.hpp:1644:   instantia
ted from ‘boost::adjacency_list<OutEdgeListS, boost::distributedS<Proce
ssGroup, InVertexListS, InDistribution>, DirectedS, VertexProperty, EdgeP
roperty, GraphProperty, EdgeListS>::adjacency_list(EdgeIterator, EdgeIter
ator, EdgePropertyIterator, typename boost::detail::parallel::adjacency_l
ist_config<OutEdgeListS, ProcessGroup, InVertexListS, InDistribution, Dir
ectedS, VertexProperty, EdgeProperty, GraphProperty, EdgeListS>::inherite
d::vertices_size_type, const ProcessGroup&, const typename boost::graph::
distributed::select_distribution<InDistribution, VertexProperty, typename
 boost::detail::parallel::adjacency_list_config<OutEdgeListS, ProcessGrou
p, InVertexListS, InDistribution, DirectedS, VertexProperty, EdgeProperty
, GraphProperty, EdgeListS>::inherited::vertices_size_type, ProcessGroup,
 typename boost::graph::internal_vertex_name<VertexProperty>::type>::type
&, const GraphProperty&) [with EdgeIterator = boost::graph::metis_reader:
:edge_iterator, EdgePropertyIterator = boost::graph::metis_reader::edge_w
eight_iterator, OutEdgeListS = boost::vecS, ProcessGroup = boost::graph::
distributed::mpi_process_group, InVertexListS = boost::vecS, InDistributi
on = boost::defaultS, DirectedS = boost::directedS, VertexProperty = boos
t::no_property, EdgeProperty = boost::no_property, GraphProperty = boost:
:no_property, EdgeListS = boost::listS]’
pagerank.cc:37:   instantiated from here
/usr/include/boost/graph/distributed/adjlist/initialize.hpp:56: error: no
 matching function for call to ‘add_edge(boost::detail::parallel::globa
l_descriptor<unsigned int>&, boost::detail::parallel::global_descriptor<u
nsigned int>&, const double&, boost::adjacency_list<boost::vecS, boost::d
istributedS<boost::graph::distributed::mpi_process_group, boost::vecS, bo
ost::defaultS>, boost::directedS, boost::no_property, boost::no_property,
 boost::no_property, boost::listS>&)’
/usr/include/boost/graph/distributed/adjacency_list.hpp:2973: note: candi
dates are: typename boost::adjacency_list<OutEdgeListS, boost::distribute
dS<ProcessGroup, InVertexListS, InDistribution>, DirectedS, VertexPropert
y, EdgeProperty, GraphProperty, EdgeListS>::lazy_add_edge_with_property b
oost::add_edge(typename boost::adjacency_list<OutEdgeListS, boost::distri
butedS<ProcessGroup, InVertexListS, InDistribution>, DirectedS, VertexPro
perty, EdgeProperty, GraphProperty, EdgeListS>::vertex_descriptor, typena
me boost::adjacency_list<OutEdgeListS, boost::distributedS<ProcessGroup,
InVertexListS, InDistribution>, DirectedS, VertexProperty, EdgeProperty,
GraphProperty, EdgeListS>::vertex_descriptor, const typename boost::adjac
ency_list<OutEdgeListS, boost::distributedS<ProcessGroup, InVertexListS,
InDistribution>, DirectedS, VertexProperty, EdgeProperty, GraphProperty,
EdgeListS>::edge_property_type&, boost::adjacency_list<OutEdgeListS, boos
t::distributedS<ProcessGroup, InVertexListS, InDistribution>, DirectedS,
VertexProperty, EdgeProperty, GraphProperty, EdgeListS>&) [with OutEdgeLi
stS = boost::vecS, ProcessGroup = boost::graph::distributed::mpi_process_
group, InVertexListS = boost::vecS, InDistribution = boost::defaultS, Dir
ectedS = boost::directedS, VertexProperty = boost::no_property, EdgePrope
rty = boost::no_property, GraphProperty = boost::no_property, EdgeListS =
 boost::listS]
make: *** [pagerank.o] Error 1


Well, obviously I called the wrong ctor, duh!

PageRank in Matlab

For a school project, I’ve been reading lots of papers on good old PageRank. We’re doing something similar to this venerable algorithm in a streaming context. For a basis of comparison, I needed a simple, reliable implementation of the classic algorithm, and Matlab (octave) happens to excel at this kind of thing. So here’s my version. Initialized appropriately, this gives the same results as the diagram on the appropriate Wikipedia page. You probably wouldn’t want to run this on the web graph though: it wants to be run in a cluster at that scale.


function[R] = pagerank(A, R, d, epsilon)
N=size(A)(1);
E=ones(N);
R_last = R;
delta=epsilon;
while (delta >= epsilon)
    R = (d * A + (1-d)/N * E) * R_last;
    dist=R-R_last;
    delta = sqrt(dot(dist,dist));
    R_last = R;
end

R is the initial ranking vector. Typically this is just 1/N for each entry, but it can be some other probability distribution to arrive at personalized rankings. A is the column-oriented adjacency matrix divided by out-degree (with a plain adjacency matrix T, K=diag(sum(T)); A=K^-1 * T). Nodes with out-degree zero should be marked adjacent to everyone. d is the dampening factor, and epsilon is the convergence threshold.

I’m pretty sure my Matlab coding skills are quite lackluster, but it works.

I’d like the shed in blue

In addition to the idea that the least interesting and most trivial patches get the most scrutiny, I think the bike-shedding effect is doubled if someone claims their implementation is extra clever. Then Linus unleashes his Thunderbolt of Clarity and sanity is restored.

To exploit this phenomenon, one seeking reviews should claim that one’s code is better than any Knuth himself could have dreamed up, uses the minimal number of clock cycles on any present or future architecture, and solves Clique in P. Then nerds will come at you like zombies.

log2 in bash

Turns out bash supports enough arithmetic operators to implement lg() just as you might in some real language:


function log2 {
    local x=0
    for (( y=$1-1 ; $y > 0; y >>= 1 )) ; do
        let x=$x+1
    done
    echo $x
}

z=$(log2 64)


I had a somewhat valid reason to want to do this. The one thing that continually gets me when I try to do dumb things like this is when to use the raw variable name and when to use the dollar sign. Yes, the semantics are well-defined, but my perl habits run deep. Similarly, in ruby, all my variables are global.

Is there a patch for bureaucracy?

So I’ve now had experience fixing other peoples’ software in several different domains. Here’s how it plays out:

Open-source project:

Thanks! Applied.

Proprietary application:

Thank you for your submission. We have taken your request under consideration, and assigned it ticket #17231. In our review for the next release, scheduled to be released in Q4 next year, we will consider each resolved ticket for possible inclusion in our product.

Government project:
[paraphrasing an angry letter to division heads]

Please find attached a 9-page document describing why we cannot include any changes in our software, even 80-line changes that fix crashes such as the ones your agency has experienced. We suggest your agency maintain your own copy of the software with the change applied.

P.S. We doubt seriously it’s really our problem even though you gave us an explanation and a reasonable code fix. Please waste some more time trying to prove it to us.

JNIs and ABIs

For a recent task at work, I had a compelling reason to break out of the Java jail and do some “real programming” in C. Unfortunately, said C-language bits were to be run on the Windows platform. I had a stripped DLL (no source) for manipulating USB cameras and an API document to work with, and a plan to duct tape this all together with JNI.

I’m not about to touch MSVC if I can help it, so I grabbed mingw packages for my platform and wrote up a small wrapper DLL to do the necessary JNI bridge. Here’s my attempt to archive the working recipe for the Google bots.

Compiling this wrapper library was quite easy with mingw:

$ cat Makefile
# This makefile builds usbcamjni.dll to wrap usbcam.dll
# Use mingw32-make to build
CPPFLAGS=-I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
LIBS=-L . -l usbcam

usbcamjni.dll: usbcamjni.o
        $(CC) -shared -o $@ $< $(LIBS)

clean: .FORCE
        $(RM) usbcamjni.dll usbcamjni.o

.FORCE:

$ mingw32-make

It turns out on Windows there are a few different calling conventions that you can choose from at compile time. Garden-variety code uses __cdecl, which is how the world should operate. The symbols are named just as they are named in the C code with a leading underscore, and the caller is responsible for cleaning up his own stack mess.

In the Windows world, there is __stdcall, where the symbol names (by default) have trailing decorations for the size of parameters on the stack (e.g. _write@4) and the callee cleans up the stack. The benefit is somewhat smaller code, but you can't have variadic functions.

So I was testing my JNI wrapper and getting JVM crashes. Looking at the register dump showed that the program counter was not pointing to a normal text virtual address, which is symptomatic of stack confusion. Once I re-acquainted myself with the above arcana, I guessed correctly that I was using the wrong calling convention for the DLL: together the function epilogue and caller were popping off too many levels of stack. However, in this case the DLL was using undecorated symbol names (e.g. _write) but still expecting __stdcall semantics. As it happens, this is how Win32 API works.

The problem is that annotating the prototypes in header files with "__stdcall" to get correct calling conventions also makes the dependent object file expect the decorated function name. Linking against the DLL results in lots of unsatisfied symbol errors.

The solution took some digging and doesn't make a whole lot of sense to me, but the way out of this using mingw tools is to build an interface library. First, you create a def file like so:

LIBRARY MYLIB.DLL
EXPORTS
Symbol1@0
Symbol2@4
Symbol3@8

Note that symbols in this file include the decorations. Then you run dlltool from mingw with -k to remove the decorations from the generated library. As I understand it, this option actually creates aliases from the decorated to undecorated name so that subsequent linking can work. The manpage says "-A" is actually for this, but before I read it, I found a mailing list post advocating "-k", and, well, it worked, so I'm joining that particular cargo cult.

$ dlltool -k -d mylib.def -l mylib.a

Then when building the JNI library wrapper, link against mylib.a instead of the original DLL. Easyish!

Kernel updates

During baby naps I actually did some Rio Karma work this week. First, I found and fixed a recent bug in usb storage that makes the Karma not work at all in 2.6.35. Then I finally set up my omfs git tree on kernel.org, with a few patches I wrote almost two years ago, plus a recently found memory leak. I’ll try to get Linus to pull it in 2.6.36. Then, I made a new release of omfsprogs rewritten to use libomfs from the FUSE version of omfs (I also did this work two years ago). Now all is right in the Karma world again; time to ignore it for another 6 months.

Then, on to ath5k. One nagging issue that the driver has always had is that there’s pretty much a total lack of synchronization among the myriad threads. There’s an ad-hoc spin lock here and there, but for example the reset procedure, which reloads hundreds of registers, can be called concurrently with itself and other driver ops, and can be invoked while trying to transmit and receive packets. So one idea we’ve been hashing out on the ML is to serialize reset with a mutex by putting it into process context via a workqueue, and disabling tasklets while it runs. I hacked that together and it’s now available in wireless-testing. Hopefully this will fix some of those weird DMA-into-freed-skb errors that can potentially be caused by untimely modification of descriptor pointers.

Finally: tracing. For the last year or so, Linux has grown its own NIH answer to DTrace in the form of ftrace and tracepoints. The wireless subsystem already has a few tracepoints here and there, and I really like it better than debug printks because they have smaller overhead in the compiled-but-not-enabled case (a branch compared to an out-of-line function call). You can pick and choose individual tracepoints to sample, or combine them with a general tracing facility such as ftrace. I have some patches baking to add various tracepoints to ath5k [1, 2, 3].

One really neat idea that I appropriated from Johannes Berg’s work on iwlwifi is to store entire packets in the tracing ring buffer. Then you can have a trace-cmd plugin to write these out as tcpdump pcap files. This is really nice: you get a packet dump that works with your existing debug infrastructure, plus you have the tracing information so you know what the code is trying to do, at whatever granularity you chose when setting up the trace. Hopefully, this will be a good one-stop debug gathering tool, not just for developers, but for users, as well.

On S-Expressions

I suppose I’m old enough now, and the rate of change in computing is fast enough, that I can regale young software developers with I-remember-when stories. You know, the ones that start off like “all of my punchcards were rubber-banded together, but I had neglected to number them…” I’ll spare you the tale of how we used to screen-scrape the mainframes at LargeAirline.com for today; instead you get to read my ramblings on languages-older-than-Go.

Today’s topic is inspired by my reading on LWN about the fact that Emacs is still around, which led me to David Wheeler’s argument that parentheses are why no one uses Lisp and that we can do better. Perhaps so, but I found his plugging of SKILL interesting, since once, long ago, when I was but a fresh young hacker, I spent a rather significant bit of time writing software in this language.

Since you have likely never heard of it, SKILL is a Lisp implementation that is (or was?) embedded in the Cadence suite of Electronic Design Automation products. As I recall, the biggest warts were:

  • the fact that you could use C-like prefixing — f(x y) was equivalent to (f x y) — but other constructs, like using parentheses for grouping, behaved as in Lisp (i.e. errored).
  • the runtime library used C-like functions, such as printf, fread, etc
  • the developer’s manual encouraged using imperative style programming like loops; indeed using recursion instead would usually lead to stack overflows

I typically wrote in the Lisp style and just ignored the C-like features as much as possible. The experience soured me on functional languages forever, which isn’t really fair, because SKILL was particularly bad.

But I’ve often wondered why it was so horrible, and how it came to be that Cadence, a large successful company, selected it as its primary extension language. Of course, almost all EDA software is horrible so the bar is pretty low, but I wondered if this was a case of Cadence not realizing how bad it was, or if it was just historical baggage. Or did a bored programmer just decide one day, “hey, let’s write our own crappy Lisp!” (cf. much of Android).

Luckily, there’s a paper on it to shed some light: “SKILL: A Lisp-based extension language” by E. Petrus, 1993. I snagged a version from behind the ACM paywall, so I won’t post it, but I’ll quote a few interesting bits here.

SKILL was originally conceived as a base language for developing simulation and test description languages… Lisp was chosen as a base language because it lent itself well to an interpreter based implementation and Lisp is a natural choice for code generation and as a meta language. The first implementation was an interpreter based on the Franz Lisp developed at Cadence Design Systems in the mid-1980s.

Fair enough. Mostly historical baggage, partly not-invented-here.

The popular syntax for SKILL is a C-like alternative to Lisp. When SKILL was first introduced, the connection with Lisp was not promoted in electronic engineering circles.

Too bad. Maybe if the same engineers had compared it to an allegedly decent Lisp, they might have demanded better.

Because of the large installed SKILL base, it has been difficult to move SKILL [in the direction of Common Lisp].

Oops.

For the most part, SKILL developers come from a EE background and lack general familiarity with functional languages. As a result, company wide code inspections revealed the need for a high level code analysis tool.

Even Cadence internal staff couldn’t use SKILL.

The paper goes on to describe why extension languages are useful as a component of software-reuse. While I won’t disagree, I find the characterizations of EDA software more an indictment on the state of software development in the industry circa mid-90s. The products I used from Mentor Graphics, Cadence, and so on were just horrendous. Each application was typically a 40 meg, crashy, statically linked executable, in which a lot of functionality was shared between a few other executables that mostly did the same thing but had subtly different missing features (probably in the name of their byzantine licensing strategies).

  • ECAD software is too large and diverse to be contained in a single executable
  • Need “fire-walls” between products. Separating major products into separate executables reduces the chance of one set of code from corrupting data in another.
  • Change control is easier when the software is built separately.
  • Independent release of products. Products linked together in the same executable must be shipped together. In separate executables, products can be shipped whenever necessary.

(Shared libraries anyone?)

I think I wouldn’t mind SKILL if it had been a decent Lisp, but unfortunately Cadence got stuck with maintaining their own one-off fork and the software ecosystem that grew up around it. At the EDA firm at which I worked, we were lucky to be a late-comer, which allowed us to use a reasonable extension language: perl. I haven’t kept up with the industry, but I hope others have followed suit.