Even C has tsearch

I’m just going on record to state that the two reasons I’ve seen on the net for lack of tree-based collections in the Python standard library — (1) that they have bad locality (gee, very unlike huge memory-hogging randomized hash tables, oh and just try scanning one of those…), and (2) that Python makes it so simple to write your own, just do that! (well, yes I can and have, but hey it’s 2011 already, rbtrees are like 40 years old!) — are inane.

New kite

Star FacetIt’s almost spring and that means time to make another kite! I haven’t done one in a couple of years so hopefully I haven’t lost all of the necessary sewing skills. This time around I’m going with a cellular design (think box kite), the #54 star facet. This morning I spent about half an hour on the first step — cutting out the necessary shapes from rolls of ripstop nylon. We have an enclosed area with great big sunny windows in our apartment which are perfect for tracing the shapes out onto the material.

The next step is adding reinforcements and binding for which I’m waiting on supplies. The Cherry Blossom nee Smithsonian Kite Festival is coming up in a couple of weeks so I want to get it all done by then. I can’t wait to see what Alex thinks of kite flying!

Updated to version 6

I bought a new wireless router the other day, a D-Link DIR-825 dual-band a/g/b/n. This is a step up in several regards from my previous 2.4 GHz 11g Linksys, not least of which is the D-Link’s ability to run OpenWRT with ath9k. Installing OpenWRT with prebuilt binaries is dirt simple; it took all of about 5 minutes to set up.

One of my motivations was to do some IPv6 testing in advance of World IPv6 Day. Comcast doesn’t yet offer IPv6 addresses to the public at large, so I set up a Hurricane Electric 6in4 tunnel. This is quite easy to configure in OpenWRT.

So now I can load ipv6.google.com in my browser. It’s approximately 2 better than the normal site.

me &= ~grad_school

An unassuming cardboard tube arrived in the mail the other day, containing my MSCS diploma from Hopkins. Thus passes my academic career. I doubt the experience will ever pay for itself financially, but I did learn a great deal, even as a crusty practitioner of some years. Returning to the student life of never-ending deadlines took quite some getting used to, and the corresponding lack of sleep was good training for Alex’s concurrent arrival. (Said arrival also made up my mind to not pursue the thesis option.)

In the main, the classes were excellent, but the small size of the school means the catalog is limited compared to Georgia Tech. For example, I would have liked a class or two on the hardware side. Hopkins only offered the equivalent of GT’s CMPE 2510 (MIPS architecture with the Hennessy and Patterson book).

Through projects and reading papers, I gained a new appreciation for the difficulty of good science, and the prevalence of bad. I’ll always be an engineer first, as I lack the patience to do science well. Banging out a project out in 2-6 weeks was de rigeur; here are some of them:

  • A handwriting recognition program based on HMMs (basic stuff, but I had no prior machine learning experience)
  • A user-space version of mac80211 hwsim. With the simulator, I evaluated the Linux rate controllers for different rates of packet loss, and found and fixed a bug in Pid. There are a few things in Minstrel that can still be improved, but there’s no surprise that it is much better than Pid.
  • A failed investigation into cache-oblivious data structures for filesystems. It’s tough to beat B-Trees at their own game. Still, I’m very excited about CO algorithms since this stuff wasn’t even around when I was an undergrad. Hopefully the right problem will come along.
  • Demand paging and swap support for xv6. This was a class assignment, but I had a lot of fun putting it together. [Code deleted from the internet at prof’s request.] The swap daemon was crap due to time constraints, but most groups didn’t get beyond identity mapping.
  • An evaluation of parallel algorithms on massive graphs. I now have a better appreciation for the ease of implementation, and horrible performance, of barrier synchronization.
  • A streaming categorical ranking system for twitter graphs. Note to NoSQL graph DB developers: Postgres is still much faster.

There are of course writeups associated with all of these, but my latent inner perfectionist is not very happy with most of them. I might revisit them with more rigor if I get bored some day.

On static in Java

In C, scoping rules aside, the following are pretty much the same:

static int x[] = { /* ... */ };
int foo()
{
    /* do something with x */
}

and

int bar()
{
    static int x[] = { /* ... */ };
    /* do something with x */
}

The static array x will go into the .data section of the executable. When the OS loads the executable into memory, it will ensure that the values for x are set in the appropriate place (e.g. by mapping its disk page to the virtual address in the ELF headers). In the second case, x is given some alias to avoid symbol conflicts.

There seems to be no way to do the second version in Java. Here’s how I tried:

static int xyzzy()
{
    final int x[] = { /* ... */ };
    /* do something with x */
}

This doesn’t do what you might think it does. While x cannot be changed, it is the reference to an array; the array elements themselves can be changed at will. Thus javac is unlikely to do anything smart here.

Indeed, in the above case, javac will initialize x with stack pushes every time xyzzy() is called. I had a real method like this, and making x a static member variable gave me an easy 2x speedup.

Snow day

"Driving" in the snowLast night I had the pleasure of driving my car less than a mile from the nearby metro to our apartment, during a heavy snow, in rush hour. Normally I walk, but this area is bad about keeping sidewalks clear in snowy weather, so driving is sometimes the safer option. It took me an hour and a half to cover the distance, at a blazing average speed of 2/3 mph.

Along the way I learned some unwritten rules of driving in DC snowstorms:

  1. Traditional traffic patterns no longer apply. If a road once traveled east-to-west, it now goes west-to-east.
  2. Ditto with traffic lights: let’s just collectively ignore them.
  3. When traffic is blocked, do attempt, and fail, to perform a three-point turn. This will ensure that traffic in front of you will begin moving again, and traffic behind you will get to participate in pushing your car out of a snowbank.
  4. At random, get out of your car and just leave it there. Say, in the middle lane of a three-lane highway.
  5. If three cars are already abandoned at various places halfway up a short, inclined on-ramp, try it yourself anyway. They probably just weren’t doing it correctly.

Hacker Delighted

I mentioned before that I had wishlisted the book Hacker’s Delight after reading the constant folding code in gcc. My better half surprised me with a copy for Christmas and I read through it in about a day and a half. To my surprise, the book is at once smaller (at just over 300 pages) while at the same time being a lot more thorough and principled than I had expected. Some formulas come with proofs, and for ranges that produce undesired results (e.g. log2(0)), the book makes a point of identifying them and discussing alternatives. Creative uses for ffs(), the ubiquitous bit-parallel permutation algorithms, and several other bit twiddling tricks are exhaustively covered, along with various integer math routines including, of course, division-by-constant. Worth a read if that’s your thing.

C++ error fail

Note to self: double-check the template arguments before asking the compiler to do so.

(WordPress hates to wrap, so I did it, badly.)

mpicxx -Wno-deprecated   -c -o pagerank.o pagerank.cc
In file included from /usr/include/boost/graph/distributed/adjacency_list
.hpp:3959,
                 from pagerank.cc:3:
/usr/include/boost/graph/distributed/adjlist/initialize.hpp: In member fu
nction ‘void boost::adjacency_list<OutEdgeListS, boost::distributedS<Pr
ocessGroup, InVertexListS, InDistribution>, DirectedS, VertexProperty, Ed
geProperty, GraphProperty, EdgeListS>::initialize(EdgeIterator, EdgeItera
tor, EdgePropertyIterator, typename boost::detail::parallel::adjacency_li
st_config<OutEdgeListS, ProcessGroup, InVertexListS, InDistribution, Dire
ctedS, VertexProperty, EdgeProperty, GraphProperty, EdgeListS>::inherited
::vertices_size_type, const typename boost::graph::distributed::select_di
stribution<InDistribution, VertexProperty, typename boost::detail::parall
el::adjacency_list_config<OutEdgeListS, ProcessGroup, InVertexListS, InDi
stribution, DirectedS, VertexProperty, EdgeProperty, GraphProperty, EdgeL
istS>::inherited::vertices_size_type, ProcessGroup, typename boost::graph
::internal_vertex_name<VertexProperty>::type>::type&, boost::vecS) [with
EdgeIterator = boost::graph::metis_reader::edge_iterator, EdgePropertyIte
rator = boost::graph::metis_reader::edge_weight_iterator, OutEdgeListS =
boost::vecS, ProcessGroup = boost::graph::distributed::mpi_process_group,
 InVertexListS = boost::vecS, InDistribution = boost::defaultS, DirectedS
 = boost::directedS, VertexProperty = boost::no_property, EdgeProperty =
boost::no_property, GraphProperty = boost::no_property, EdgeListS = boost
::listS]’:
/usr/include/boost/graph/distributed/adjacency_list.hpp:1644:   instantia
ted from ‘boost::adjacency_list<OutEdgeListS, boost::distributedS<Proce
ssGroup, InVertexListS, InDistribution>, DirectedS, VertexProperty, EdgeP
roperty, GraphProperty, EdgeListS>::adjacency_list(EdgeIterator, EdgeIter
ator, EdgePropertyIterator, typename boost::detail::parallel::adjacency_l
ist_config<OutEdgeListS, ProcessGroup, InVertexListS, InDistribution, Dir
ectedS, VertexProperty, EdgeProperty, GraphProperty, EdgeListS>::inherite
d::vertices_size_type, const ProcessGroup&, const typename boost::graph::
distributed::select_distribution<InDistribution, VertexProperty, typename
 boost::detail::parallel::adjacency_list_config<OutEdgeListS, ProcessGrou
p, InVertexListS, InDistribution, DirectedS, VertexProperty, EdgeProperty
, GraphProperty, EdgeListS>::inherited::vertices_size_type, ProcessGroup,
 typename boost::graph::internal_vertex_name<VertexProperty>::type>::type
&, const GraphProperty&) [with EdgeIterator = boost::graph::metis_reader:
:edge_iterator, EdgePropertyIterator = boost::graph::metis_reader::edge_w
eight_iterator, OutEdgeListS = boost::vecS, ProcessGroup = boost::graph::
distributed::mpi_process_group, InVertexListS = boost::vecS, InDistributi
on = boost::defaultS, DirectedS = boost::directedS, VertexProperty = boos
t::no_property, EdgeProperty = boost::no_property, GraphProperty = boost:
:no_property, EdgeListS = boost::listS]’
pagerank.cc:37:   instantiated from here
/usr/include/boost/graph/distributed/adjlist/initialize.hpp:56: error: no
 matching function for call to ‘add_edge(boost::detail::parallel::globa
l_descriptor<unsigned int>&, boost::detail::parallel::global_descriptor<u
nsigned int>&, const double&, boost::adjacency_list<boost::vecS, boost::d
istributedS<boost::graph::distributed::mpi_process_group, boost::vecS, bo
ost::defaultS>, boost::directedS, boost::no_property, boost::no_property,
 boost::no_property, boost::listS>&)’
/usr/include/boost/graph/distributed/adjacency_list.hpp:2973: note: candi
dates are: typename boost::adjacency_list<OutEdgeListS, boost::distribute
dS<ProcessGroup, InVertexListS, InDistribution>, DirectedS, VertexPropert
y, EdgeProperty, GraphProperty, EdgeListS>::lazy_add_edge_with_property b
oost::add_edge(typename boost::adjacency_list<OutEdgeListS, boost::distri
butedS<ProcessGroup, InVertexListS, InDistribution>, DirectedS, VertexPro
perty, EdgeProperty, GraphProperty, EdgeListS>::vertex_descriptor, typena
me boost::adjacency_list<OutEdgeListS, boost::distributedS<ProcessGroup,
InVertexListS, InDistribution>, DirectedS, VertexProperty, EdgeProperty,
GraphProperty, EdgeListS>::vertex_descriptor, const typename boost::adjac
ency_list<OutEdgeListS, boost::distributedS<ProcessGroup, InVertexListS,
InDistribution>, DirectedS, VertexProperty, EdgeProperty, GraphProperty,
EdgeListS>::edge_property_type&, boost::adjacency_list<OutEdgeListS, boos
t::distributedS<ProcessGroup, InVertexListS, InDistribution>, DirectedS,
VertexProperty, EdgeProperty, GraphProperty, EdgeListS>&) [with OutEdgeLi
stS = boost::vecS, ProcessGroup = boost::graph::distributed::mpi_process_
group, InVertexListS = boost::vecS, InDistribution = boost::defaultS, Dir
ectedS = boost::directedS, VertexProperty = boost::no_property, EdgePrope
rty = boost::no_property, GraphProperty = boost::no_property, EdgeListS =
 boost::listS]
make: *** [pagerank.o] Error 1


Well, obviously I called the wrong ctor, duh!

Wrapping up

I just submitted my application for graduation here at Hopkins, so as my grad_school bit settles over the next couple of months, I must now contemplate what to do next. I haven’t given it a whole lot of thought, except that I will be glad to be out of academics for the rest of eternity. I’ll save an account of my experience here for another post, but it has been generally positive. Anyway, my dance card is far from empty, but while I’m in limbo between real work and finishing school, and deciding where our family will live, I might as well be promiscuous and see what’s out there.

I guess if I had to enumerate my occupational preferences, they would go something like this:

  • Unix kernel hacker or driver developer. Between school and the new parental status, time for my ath5k / karma / android hobby has dwindled to a trickle. I hate to see my x86 assembly skills go to waste. Looking at you, Ginormous Linux Company.
  • Systems work on problems of huge scale. This has been a recent theme of my studies, and there are some fascinating problems to tackle.
  • Systems work on small scales. Embedded work is just a lot more fun than desktops.
  • Any cool project with smart coworkers
  • Open source
  • Usual niceties like close to home, close to child care, health plan, big bag of cash…

Minus points for: UI work, remote object technologies, anything with angle brackets, function names with too many capital letters, centralized version control, marginally reinvented functional languages

PageRank in Matlab

For a school project, I’ve been reading lots of papers on good old PageRank. We’re doing something similar to this venerable algorithm in a streaming context. For a basis of comparison, I needed a simple, reliable implementation of the classic algorithm, and Matlab (octave) happens to excel at this kind of thing. So here’s my version. Initialized appropriately, this gives the same results as the diagram on the appropriate Wikipedia page. You probably wouldn’t want to run this on the web graph though: it wants to be run in a cluster at that scale.


function[R] = pagerank(A, R, d, epsilon)
N=size(A)(1);
E=ones(N);
R_last = R;
delta=epsilon;
while (delta >= epsilon)
    R = (d * A + (1-d)/N * E) * R_last;
    dist=R-R_last;
    delta = sqrt(dot(dist,dist));
    R_last = R;
end

R is the initial ranking vector. Typically this is just 1/N for each entry, but it can be some other probability distribution to arrive at personalized rankings. A is the column-oriented adjacency matrix divided by out-degree (with a plain adjacency matrix T, K=diag(sum(T)); A=K^-1 * T). Nodes with out-degree zero should be marked adjacent to everyone. d is the dampening factor, and epsilon is the convergence threshold.

I’m pretty sure my Matlab coding skills are quite lackluster, but it works.