JNIs and ABIs

For a recent task at work, I had a compelling reason to break out of the Java jail and do some “real programming” in C. Unfortunately, said C-language bits were to be run on the Windows platform. I had a stripped DLL (no source) for manipulating USB cameras and an API document to work with, and a plan to duct tape this all together with JNI.

I’m not about to touch MSVC if I can help it, so I grabbed mingw packages for my platform and wrote up a small wrapper DLL to do the necessary JNI bridge. Here’s my attempt to archive the working recipe for the Google bots.

Compiling this wrapper library was quite easy with mingw:

$ cat Makefile
# This makefile builds usbcamjni.dll to wrap usbcam.dll
# Use mingw32-make to build
CPPFLAGS=-I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
LIBS=-L . -l usbcam

usbcamjni.dll: usbcamjni.o
        $(CC) -shared -o $@ $< $(LIBS)

clean: .FORCE
        $(RM) usbcamjni.dll usbcamjni.o

.FORCE:

$ mingw32-make

It turns out on Windows there are a few different calling conventions that you can choose from at compile time. Garden-variety code uses __cdecl, which is how the world should operate. The symbols are named just as they are named in the C code with a leading underscore, and the caller is responsible for cleaning up his own stack mess.

In the Windows world, there is __stdcall, where the symbol names (by default) have trailing decorations for the size of parameters on the stack (e.g. _write@4) and the callee cleans up the stack. The benefit is somewhat smaller code, but you can't have variadic functions.

So I was testing my JNI wrapper and getting JVM crashes. Looking at the register dump showed that the program counter was not pointing to a normal text virtual address, which is symptomatic of stack confusion. Once I re-acquainted myself with the above arcana, I guessed correctly that I was using the wrong calling convention for the DLL: together the function epilogue and caller were popping off too many levels of stack. However, in this case the DLL was using undecorated symbol names (e.g. _write) but still expecting __stdcall semantics. As it happens, this is how Win32 API works.

The problem is that annotating the prototypes in header files with "__stdcall" to get correct calling conventions also makes the dependent object file expect the decorated function name. Linking against the DLL results in lots of unsatisfied symbol errors.

The solution took some digging and doesn't make a whole lot of sense to me, but the way out of this using mingw tools is to build an interface library. First, you create a def file like so:

LIBRARY MYLIB.DLL
EXPORTS
Symbol1@0
Symbol2@4
Symbol3@8

Note that symbols in this file include the decorations. Then you run dlltool from mingw with -k to remove the decorations from the generated library. As I understand it, this option actually creates aliases from the decorated to undecorated name so that subsequent linking can work. The manpage says "-A" is actually for this, but before I read it, I found a mailing list post advocating "-k", and, well, it worked, so I'm joining that particular cargo cult.

$ dlltool -k -d mylib.def -l mylib.a

Then when building the JNI library wrapper, link against mylib.a instead of the original DLL. Easyish!