Mavened.

So, I’m applying a one line patch to a Java package, and oh yes, it needs maven. Hooray!

$ apt-cache depends --recurse maven | grep Depends | 
     grep -v '<' | egrep lib.*java | sort | uniq | wc -l
203
[... start build and watch maven download another pile ...]
$ find ~/.m2 -name *pom | wc -l
851

I find it hard to believe that there are that many useful libraries in the world.

Orders of magnitude

I had a Hadoop map-reduce job that kept timing out, which led to this interesting discovery:

$ time ./json-parser-test.py

real	0m0.205s
user	0m0.152s
sys	0m0.032s

$ time ./json-parser-test-no-speedups.py

real	0m2.069s
user	0m2.044s
sys	0m0.024s

$ time jython ./json-parser-test-no-speedups.py

real	79m59.785s
user	80m23.709s
sys	0m14.441s

Moral: use Java-based JSON libraries if you have to use Jython and JSON. Also, Java sucks.

On static in Java

In C, scoping rules aside, the following are pretty much the same:

static int x[] = { /* ... */ };
int foo()
{
    /* do something with x */
}

and

int bar()
{
    static int x[] = { /* ... */ };
    /* do something with x */
}

The static array x will go into the .data section of the executable. When the OS loads the executable into memory, it will ensure that the values for x are set in the appropriate place (e.g. by mapping its disk page to the virtual address in the ELF headers). In the second case, x is given some alias to avoid symbol conflicts.

There seems to be no way to do the second version in Java. Here’s how I tried:

static int xyzzy()
{
    final int x[] = { /* ... */ };
    /* do something with x */
}

This doesn’t do what you might think it does. While x cannot be changed, it is the reference to an array; the array elements themselves can be changed at will. Thus javac is unlikely to do anything smart here.

Indeed, in the above case, javac will initialize x with stack pushes every time xyzzy() is called. I had a real method like this, and making x a static member variable gave me an easy 2x speedup.

JNIs and ABIs

For a recent task at work, I had a compelling reason to break out of the Java jail and do some “real programming” in C. Unfortunately, said C-language bits were to be run on the Windows platform. I had a stripped DLL (no source) for manipulating USB cameras and an API document to work with, and a plan to duct tape this all together with JNI.

I’m not about to touch MSVC if I can help it, so I grabbed mingw packages for my platform and wrote up a small wrapper DLL to do the necessary JNI bridge. Here’s my attempt to archive the working recipe for the Google bots.

Compiling this wrapper library was quite easy with mingw:

$ cat Makefile
# This makefile builds usbcamjni.dll to wrap usbcam.dll
# Use mingw32-make to build
CPPFLAGS=-I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
LIBS=-L . -l usbcam

usbcamjni.dll: usbcamjni.o
        $(CC) -shared -o $@ $< $(LIBS)

clean: .FORCE
        $(RM) usbcamjni.dll usbcamjni.o

.FORCE:

$ mingw32-make

It turns out on Windows there are a few different calling conventions that you can choose from at compile time. Garden-variety code uses __cdecl, which is how the world should operate. The symbols are named just as they are named in the C code with a leading underscore, and the caller is responsible for cleaning up his own stack mess.

In the Windows world, there is __stdcall, where the symbol names (by default) have trailing decorations for the size of parameters on the stack (e.g. _write@4) and the callee cleans up the stack. The benefit is somewhat smaller code, but you can't have variadic functions.

So I was testing my JNI wrapper and getting JVM crashes. Looking at the register dump showed that the program counter was not pointing to a normal text virtual address, which is symptomatic of stack confusion. Once I re-acquainted myself with the above arcana, I guessed correctly that I was using the wrong calling convention for the DLL: together the function epilogue and caller were popping off too many levels of stack. However, in this case the DLL was using undecorated symbol names (e.g. _write) but still expecting __stdcall semantics. As it happens, this is how Win32 API works.

The problem is that annotating the prototypes in header files with "__stdcall" to get correct calling conventions also makes the dependent object file expect the decorated function name. Linking against the DLL results in lots of unsatisfied symbol errors.

The solution took some digging and doesn't make a whole lot of sense to me, but the way out of this using mingw tools is to build an interface library. First, you create a def file like so:

LIBRARY MYLIB.DLL
EXPORTS
Symbol1@0
Symbol2@4
Symbol3@8

Note that symbols in this file include the decorations. Then you run dlltool from mingw with -k to remove the decorations from the generated library. As I understand it, this option actually creates aliases from the decorated to undecorated name so that subsequent linking can work. The manpage says "-A" is actually for this, but before I read it, I found a mailing list post advocating "-k", and, well, it worked, so I'm joining that particular cargo cult.

$ dlltool -k -d mylib.def -l mylib.a

Then when building the JNI library wrapper, link against mylib.a instead of the original DLL. Easyish!

JBoss Hash

Here’s how to figure out the hashes of method names used by JBoss’ RMI invoker from the shell. The first two numbers in the method signature form the length of the string. Maybe there’s a better way to do the backrefs in sed, but you do need to swap them around.

$ echo $[`printf "0031ping(Ljava/lang/String;)V" | 
    sha1sum | 
    sed -e "s/(..)(..)(..)(..)(..)(..)(..)(..).*/0x87654321/g"`]

Don’t ask me why I figured that out.

sizeof()

Renaming your class variables to shorter names reduces your EJB network footprint. Finally, those people who said that removing vowels from variable names improved performance are vindicated! Ok, well they were correct already in Javascript, but that doesn’t count.

swing rant

Never ever write a Java gui application with the requirement that you check fields before letting the user tab off of them. What a nightmare! Your only two tools, without rewriting large parts of Swing yourself, are FocusListener and InputVerifier. FocusListener is great for the case when you have two such fields. Set up bad data in each field, then watch the focus traversal war as a focusLost() method reclaims the focus for one component, causing focusLost() in the other component to fire. Fun. Then you have InputVerifier, ostensibly designed for this very purpose. Ignoring the fact that buttons still fire without the verifier getting called, now you have the awesomeness of not knowing what the target component would be. Want to build a view with multiple fields that get validated as one? Good luck with that.

Meh

I’m playing with date conversions today, and again I’m struck by how much the Java Calendar should be held up as an example of the over-engineered API. Has anyone ever used anything besides the Gregorian calendar? They were so proud of it when it hit 1.1.

I should have two patches hitting kernel 2.6.26, one entirely cosmetic and one that fixes a real bug on Atheros wireless cards. Akpm did pick up the OMFS patchset so hopefully that will go in .27 timeframe, though the jury is still out on whether it hits mainline.

In other news, take that, Skype!

javac sucks

More evidence that javac is one of the dumbest compilers ever:

public class Foo extends java.lang.Object{
public Foo();
Code:
0:   aload_0
1:   invokespecial   #1; //Method java/lang/Object."":()V
4:   return

public static void main(java.lang.String[]);
Code:
0:   aconst_null
1:   instanceof      #2; //class Foo
4:   ifeq    15
7:   getstatic       #3; //Field java/lang/System.out:Ljava/io/PrintStream;
10:  ldc     #4; //String never!
12:  invokevirtual   #5; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
15:  return
}


Ok, I’ll give them a break — this probably never comes up in a real application.