procced mail

I’ve been meaning to do something about my spam situation, which rather sucks at the moment. Now that my email address appears occasionally on the linux kernel mailing list and elsewhere, I’ve not suprisingly got a signal-to-noise ratio of something like 200:1. I’ve had spam assassin in the loop for a while, but the Bayesian analysis always invokes the OOM killer on my memory poor firewall so it’s not running with full effectiveness. Thus I took a little time out to add a whitelist to the front end to make things go better: if you’re in the whitelist, it skips all the filters. Meanwhile SA will get the threshold cranked up to consider practically everything spam, and I’ll be more likely to insert one-off regexps that trash certain spams that I get over and over again.


To make the whitelist, I went to my existing mail folders (in mbox format, of course). This was easy enough:

#! /bin/sh

for i in `/bin/ls | grep -v junk`; do
formail -s formail -zx "From: " -zx "To: " < $i |
sed -e "s/^.*<(.*)>.*$/1/g" >> whitelist.uns
done
cat whitelist.uns | sort | uniq > whitelist
rm whitelist.uns

Then, it took a while to figure out how to make the proper procmail recipe to use it, but I eventually came up with this:

MAILDIR=Mail
FGREP=/bin/fgrep

:0:
* !^List-Id:.*
* ? formail -zx "From:" | ${FGREP} -F -v -i -w -f whitelist
{
:0fw
* <100000
| spamassassin 

:0:
* ^X-Spam-Status: Yes
junk

:0:
* ^Subject:.*Corel Draw.*
junk

:0:
* ^Subject: *[^ ]* new(s)?$
junk

:0D:
* ^Subject:.*(O|0)EM *
junk

:0D:
* ^Subject:.*[A-Z]+[a-z][a-z][a-z][A-Z]+
junk
}

:0Ecf
| formail -A"X-Whitelist-Passed: OK"

# kill html email
:0
* ^Content-type: text/html
{
:0bfw
| (echo "[html stripped]"; lynx -dump -force_html -stdin)

:0ahfw
| formail -i"Content-type: text/plain"
}

Open sores

Well, as of yesterday, after a long protracted review and release process, several lines of code by me have found their way into an operating system: Linux 2.6.16 includes my Rio Karma drivers. So you can go ahead and blame me if Linux sucks. Then I’ll blame Jeff Garzik in turn, just because.

Last weekend was nice. Angeline and I spent most of it shopping for her DC area apartment for her new job at NIH beginning this summer. If anyone knows of a cheap but nice apartment convenient to Bethesda, let’s hear it.

Note to job seekers: don’t put “career” or “job” in your resume. I keep getting keyword searches in my httpd logs for “objective education experience oracle unix -job -career.” I guess said headhunters won’t see this post, so let me take this opportunity to tell them that they are ugly and they smell. I also get lots of referrer spam from ballsacks.net. Just thought you should know.

Mobile

A few months ago I bowed to peer pressure and purchased a new phone. Prior to that, my antenna had been held on with hot melt glue for several months. So I read the internets, picked out a phone, found an unlocked version (I’m not going to be a serf to the mobile companies!) and plonked down the requisite credit card number into the order form. When it arrived a few days later, my co-workers were mystified. “That’s not a RAZR,” they said, incredulously. “Why get a dumpy looking Nokia [6230 for those who care]? Paris Hilton wouldn’t touch that.”) Well, a few reasons: I get craptastic reception at the house, and Nokias are legendary for good reception. On that front, I’ve been very pleased; no dropouts anywhere in my house whereas I had to stand on the porch with my old phone. Also, it has a decent MP3 player built in with expandable flash memory. I have a real MP3 player already but it never hurts to have a few tunes with you for bumpy places such as the gym where the hard drive is unwelcome. Reason the third: EDGE.

I spend at least two hours every day on a bus getting to work, and it sure would be nice to use the net some during that block of time. So effective today I’m on T-Mobile’s Total Internet thingy. With that, I can connect my phone via a USB cable to the laptop and use it as a modem to connect at a decent speed from anywhere. It works pretty well — better than dialup, worse than anything else. I posted this from the bus.

Speaking of internet, my mom & dad have just joined the 18th century by getting on the DSL bandwagon… sort of: they got the 256k ADSL. Hey, but remember when people were paying $100/mo for ISDN? You know you wanted it. Anyway, I give my parents -5 seconds until their Windows 95 machine is rooted.

Two degrees

I’m Kevin Bacon on BoingBoing today, for some reason. First we have the link to BBS Ads which then links to my projects page because they used my awesome ansiconv program. Actually this program is pretty dead and numerous better projects have sprung up to replace it, but I still claim first mover status.

Then there’s former roommate Matt’s dad, saying that beer serving robots are just an attention grab compared to real robots like Roomba. To that I have to say: I want a beer pouring Roomba!

Anyone know what happened to Matt?

jconsole

I still hate Java, but I do have to say that J2SE “5” is a much needed improvement. Ahh, generics, where have you been all this time? This week I’ve been rewriting an old buggy, crufty server we had sitting around (before: 5000 lines of code, after: 600) and discovered one of the neatest new whizbang features: jconsole, a standalone JVM monitor and JMX console. If you’ve used JBoss’ jmx-console webapp before, it’s the same idea, just prettier.

All you have to do to use it is add -Dcom.sun.management.jmxremote to the java command line, and then run jconsole. Then you can monitor memory usage, get live stack traces on all the running threads, and get general VM statistics. If you create an MBean interface and implementation and register it with the platform MBean server, then you can interact with the application just as if deployed in a JMX container. For example, I have MBean methods to make the server enter and leave maintenance mode for software upgrades. To use them, we just connect with jconsole and click the proper button. Neat.

in-kernel

ChangeLog-2.6.16-rc1:

commit 0e6e1db4ac7acfe3e38bbef9eba59233ba7f6b9a
Author: Bob Copeland 
Date:   Mon Jan 16 22:14:20 2006 -0800

[PATCH] partitions: Read Rio Karma partition table

The Rio Karma portable MP3 player has its own proprietary partition table.

The partition layout is similar to a DOS boot sector but it begins at a
different offset and uses a different magic number (0xAB56 instead of
0xAA55).  Add support for it to enable mounting the device.

Signed-off-by: Bob Copeland 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

Who wants to touch me?

Nerd stuff

I’m entering a new phase in the product cycle at work so I get to think more about the current state of the way-behind-the-times while designing a new system.

Swing’s MVC design for tables and combo boxes and so on totally misses the point. MVC is about abstracting away the UI from the model; making the tables explicitly require a model that implements a javax.swing interface does nothing to reduce the coupling between interface and core code. What if I want a curses-based view instead of Swing on top of the same model? I am annoyed that Gtk2 seems to have picked up this misfeature.

Why did Java “5” introduce only an “enhanced for-loop” to go with generics? They should’ve taken a cue from STL and stressed algorithmic reuse. Where is a method like Perl’s map? Also, while we’re at it, I want real closures.

EJB is way too complex, and Sun figured that out, and released a new spec. So far things are looking much better. I don’t like annotations as they clutter the code and seem like a reinvention of #pragma: it looks like something the compiler shouldn’t know about, but does. On the other hand deployment descriptors were 1e15 times worse. The EJB 3 persistence model is a lot cleaner, and surprise, you can use your database code outside the container now. Way to finally support one of the basic goals of OO. In fact, I had a real-world problem to solve earlier this year: write a small application that reuses our DB backend stuff, all command line based, without hitting the EJB tier. This is harder than it sounds, and eventually had me writing my own datasource provider with connection and statement pooling that wrapped the JDBC drivers of our database. Ick.

I hate VMs. Operating systems are there for a reason. If I get another OutOfMemory exception because I didn’t pass -Xmx19201231230 to my leak-free Java program…

AJAX? A new name for what we did with javascript in hidden frames back in 1996? Okay, XmlHttpRequest, I’ll give you that. But it doesn’t deserve being thought of as a new technology. Same for “Service Oriented Architecture,” “Enterprise Service Bus,” and every other re-invention of RPC.

I still think Grady Booch is an idiot.

Kernel programming is fun: you can use goto, bitwise ops, cast pointers to structures, and it’s all okay. Also: don’t comment too much.

New computer

Allow me once again to sing the praises of freecycle, the group that exemplifies “one man’s trash is another man’s trash.” While I’ve unloaded a few junky things this route, last week was my first pick-up. A poster offered an “old” Dell case with Pentium III (733 mHz) motherboard and CPU, and I didn’t hesitate to grab it.

Now, let’s put “old” into context: the machine was evidently made around Y2K. I had my firewall running off of a machine that was also given to me, “old,” in 1999. It was a Pentium 100 that used to belong to a Gateway, had 64 megs of RAM and sat in an AT case. Remember ISA? Big keyboard connectors? Serial ports? This machine has served me faithfully, running a 2.0 Linux kernel for years, delivering mail and proxying all of my network traffic, even back to the days when the upstream connection was a modem. As of yesterday it was on a 160 day uptime — certainly I’ve had longer, but still not bad. I think the last downtime was when I painted my room and had to move it.

Enter my new “old” box. It came without RAM, but luckily I already had 512 megs of SDRAM sitting around. The CPU, video card, and network card included all looked fine, so I grabbed one of the net cards from box #1 and threw it in, swapped in my hard drive, and had it up and running. The fan was very noisy, so I took a trip down to Best Buy.

When are we going to get a Fry’s in NoVA already? To think that my best local so-called computer store is Best Buy… this is a sad state of affairs. Well, they did have case fans, but only the kind with blue blinking LEDs. Resigned, I plopped down my $10 and headed home to install the thing. The good news is that the machine is now quiet enough to act as the new firewall for my home. The bad news: blue blinking LEDs. How am I supposed to sleep with these things flashing all night? There’s no off switch and you can’t cover up the fan. One of these days I might take a soldering iron to the blinking light portion of this thing.

Anyway, a new, much faster, hash is born. Let us hope he is as reliable as his predecessor. And I look forward to getting a free Athlon 64 in 2011.

hashed

Despite earning my degree in computer engineering, I haven’t done anything useful with assembly language since I was a strapping young idealistic lad convinced that compilers lie along the road to inefficiency. Much has changed. Heck, I write Java code for a living now — pretty much the opposite of efficient. I have to have a gig of ram just to run that sucky ant program.

While hacking my MP3 player, I discovered that the filesystem uses hashing to quickly lookup file names, which brought up the question of which hash function it uses. While I suppose one could reverse the hash function knowing a very large set of inputs and outputs, I decided it would probably be much more expedient to just put my atrophied x86 asm knowledge to work for me.

This turned out to be a lot easier than I thought. It only took about 20 minutes and I never had to step through code in a debugger.

Step 1: Disassemble Windows program for loading files onto the device, including the data segment. Look for the offset of a useful printf format string (“hash is %d, expected %d”).
Step 2: Search disassembly for loading said offset in a call to printf. Not surprisingly, this is right after the computation of the hash.
Step 3: Examine nearby calls for things like shifts and mods (common hashing operations).
Step 4: Relearn the stupidities of the x86 ISA (ecx is a loop counter, eax and edx figure in mysteriously for divides, etc).
Step 5: Convinced that a nearby call is it, reimplement in C and test.

Booyeah.

Good Karma

Just to follow up the earlier post, I officially announce my Karma driver web page, including the kernel patch such as it is. I’ve worked out some more details about the disk so I expect to be able to mount it shortly. Peter from empeg declares it “could be useful.”