Orders of magnitude

I had a Hadoop map-reduce job that kept timing out, which led to this interesting discovery:

$ time ./json-parser-test.py

real	0m0.205s
user	0m0.152s
sys	0m0.032s

$ time ./json-parser-test-no-speedups.py

real	0m2.069s
user	0m2.044s
sys	0m0.024s

$ time jython ./json-parser-test-no-speedups.py

real	79m59.785s
user	80m23.709s
sys	0m14.441s

Moral: use Java-based JSON libraries if you have to use Jython and JSON. Also, Java sucks.

Even C has tsearch

I’m just going on record to state that the two reasons I’ve seen on the net for lack of tree-based collections in the Python standard library — (1) that they have bad locality (gee, very unlike huge memory-hogging randomized hash tables, oh and just try scanning one of those…), and (2) that Python makes it so simple to write your own, just do that! (well, yes I can and have, but hey it’s 2011 already, rbtrees are like 40 years old!) — are inane.