The latest on letters in squares

Somewhere along the way this summer I quit doing crosswords for a while. Maybe my gardening hobby took over, or I didn’t want to touch the computer after work, or perhaps pandemic has been too much of a stressor, or maybe it just felt too much of an obligation to keep up with — whatever the reason, I fell out of practice a bit. Visualized, my puzzling activity for 2022 looks like this:


legend: white = streak meaning I solved it the day of, gray = I did it (possibly much) later, black = still haven’t opened it

I also skipped entirely the ACPT this year.

Recently, though, an LWN story about Gnome Crosswords rekindled my interest a bit. A few of the puzzles from my website have been merged into this project, and I have another few sitting around that I will polish up and submit someday.

This was a good exercise to clean up some awkward fill:

  • BOB / MAKO cross to BIB / MAKI; for this rejected-from-NYT puzzle, I still felt bad about putting my own name in it; also I guess MAKI is more common for anyone who has ever eaten sushi
  • PETRE [Architect of Christchurch Basilica (?!)] / GAR to PETME / GAM in this puzzle. Although GAM feels objectifying, I clued it as bygone film noir slang. Anyway PETRE was so terrible that it had to go.
  • Same puzzle, AKA / OKE [Just great, in old slang (?!)] to ADA / ODE. People running Linux are quite likely to know Ada Lovelace, besides which it is in mainstream puzzles by now, and OKE is very O_o.

There are still some lousy answers in both puzzles, driven by grid shape so fixing would require a lot of work, but these tiny tweaks make them a lot less bad.

Meanwhile, I’ve completed most of the early-week puzzles over the last few weeks, chasing the 5-minute mark on Tuesday puzzles (personal best currently 5:07) and the 4-minute mark on Mondays (4:12).

The discussion in the LWN thread was about crossword file formats, and since Gnome crosswords uses ipuz, I dusted off XwordJS and added ipuz support, and even wrote a couple of test cases. For my money I still prefer XD, much the same as I prefer Markdown to HTML, which brings me back to the image at the beginning of the post.

When I was a wee lad, we would write programs to make images, and all libraries sucked in these times so we would often roll our own. As an example of sucking, this is from a real-life comment in actual code:

 * We use C's setjmp/longjmp facility to return control.  This means that the
 * routine which calls the JPEG library must first execute a setjmp() call to
 * establish the return point.  We want the replacement error_exit to do a
 * longjmp().  But we need to make the setjmp buffer accessible to the
 * error_exit routine.  To do this, we make a private extension of the
 * standard JPEG error handler object.  (If we were using C++, we'd say we
 * were making a subclass of the regular error handler.)

Being poor students at the time, we could not afford to spend our hard-earned money on compression, decimal to binary conversion, or complex serializers, so we did the simplest possible thing that worked: we wrote out P[BGP]M text files (we were profligate when it came to disk). My little visualization was done just in this way: open up vim, a few macros later, a PGM file is born. I let gimp handle the hard work of rotating, scaling up, and converting to PNG, but ImageMagick probably would’ve worked just as well.

P2
7 38
255

128 128 128 128 128 128 255
255 255 255 255 255 255 255
[...]

There is a lot to be said for plain old text, and no, that does not include JSON.

In which I faked a person

Having successfully shipped a project at $dayjob after some extended crunch time, I took this week off to recharge. This naturally gave me the opportunity to, um, write more code. In particular, I worked a bit on my crossword constructor while also constructing a crossword. I’m a bit rusty in this area, so while I was able to fill a puzzle with a reasonable theme, I’m probably going to end up redoing the fill before trying to publish that one because some areas are pretty yuck.

Which brings me to computing project number two: a neural net that tries to grade crosswords. Now, I can and have done this using some composite scores of the entries, based on some kind of word list rankings, but for this go-round I thought it would be fun to emulate the semi-cantankerous NYT crossword critic, Rex Parker. Parker (his nom de plume) is infamous for picking apart the puzzle every day and identifying its weak spots. Some time ago, Dave Murchie set up a website, Did Rex Parker Like The Puzzle, which, as the URL suggests, gives the short-on-time enthusiast the Reader’s Digest version. What if we had, say, wouldrexparkerlikethepuzzle.com: would this level of precognition inevitably lead us into an apocalyptic nightmare, even worse than the one we currently inhabit? Let us throw caution into the wind like so many Jurassic Park scientists and see what happens.

I didn’t do anything so fancy as to generate prose with GPT-3; instead I just trained a classifier using images of the puzzles themselves. Maybe, thought I, a person (and therefore a NN) can tell whether the puzzle is good or bad just by looking at the grid. Let’s assume Rex is consistent on what he likes — if so we could use simple image recognition to tell whether something is Rex-worthy or not. Thanks to Murchie’s work, I already had labels for 4 years of puzzles, so I downloaded all of those puzzles and trained an NN on them, as one does.

I tried a couple of options for the grid images. In one experiment, I used images derived from the filled grids, letters and all; in another, I considered only the empty grid shape itself. It didn’t make much difference either way, which suggests the language aspect of the puzzle is not really useful or adequately captured by the model.

How well did it work? Better than a coin flip, but not by a lot.

When trained with filled grids, it achieved an accuracy of 58.7%. When trained with just the grid shape, it achieved an accuracy of 61.4%.

Both models said he would like today’s (10-31-2020) puzzle, about which he was actually fairly ambivalent. My guess is the model is really keying in on number of black squares as a proxy for it being a Friday or Saturday puz, which he tends to like better than any other day of the week and therefore this one was highly ranked. Probably just predicting on number of squares would have performed similarly.

Watts up

One of my goals with this new computer is to be more aggressive about power saving: keeping it in suspend more often, using wake-on-lan for external access, etc. To that end, I dusted off the old kill-a-watt and took some baseline measurements:

Off, but plugged in: 2W
Suspend: 2W
On, idle: 48W (old machine: 100!)
Kernel build: 200W (old machine: 150, but also took 15x longer)
ML training with GPU at 100%: 400W

So long as I don’t run ML training 24-7, I am already going to save a lot of energy with this build.

New build

Last year, I spent a few weeks dabbling in machine learning, which remains an interesting area to explore though not directly related to my day-to-day work. Although the economics generally work in favor of doing ML in the cloud, there’s something to be said for having all of your code and data local and not having to worry about shutting down virtual hosts all the time. My 10+ year old PC just doesn’t cut it for ML tasks, and so I made a new one.

The main requirements for me are lots of cores (for kernel builds) and a hefty GPU or four (for ML training). For more than two GPUs, you’re looking at AMD Threadrippers; for exactly two you can go with normal AMD or intel processors. The Threadrippers cost about $500 more factoring in the motherboard. I decided that chances of me using more than two GPUs (or even more than one) were pretty darn slim and not worth the premium.

In the end I settled on a 12-core Ryzen 9 3900X with RTX 2070 GPU coming in around $1800 USD with everything. Unfortunately, in this arena everything is marketed to gamers, so I have all kinds of unasked-for bling from militaristic motherboard logos to RGB LEDs in the cooler. Anyway, it works.

Just to make up a couple of CPU benchmarks based on software I care about:

filling a 7x7 word square (single core performance)
~~~~~~~~~~
old:
real	0m10.689s
user	0m10.534s
sys	0m0.105s

new:
real	0m2.274s
user	0m2.243s
sys	0m0.016s

allmodconfig kernel build
with -j $CORES_TIMES_TWO (multicore performance)
~~~~~~~~~~
old:
real	165m11.219s
user	455m42.557s
sys	135m37.557s

new:
real	9m31.778s
user	193m31.477s
sys	23m19.117s

This is all with stock clock settings and so on. I haven’t tried ML training yet, but the speedup there would be +inf considering it didn’t work at all on my old box.

Virtual doorbell

We had some cameras installed at our house last year, partly for security, but mainly so we could know who is at the door before answering it in our PJs. Unfortunately, the software that comes with the camera DVR is pretty clunky, so it takes way too long to bring up the feed when the doorbell rings and I often don’t bother.

Luckily, the DVR exposes RTSP streams that you can capture and playback with your favorite mpeg player. And I just learned how to build a pretty good image classifier that needed a practical application.

A ridiculously good-looking person is at the door

Thus, I built an app to tell whether someone is at the door, before they ring the bell. I labeled some 4000 historical images as person or non-person, trained a CNN, and made a quick python app to run inference on the live feed. When a person is in range of the door camera, it aims the lasers tells you so.

Doorbell MVP

Not bad for having two whole weeks of deep learning under my belt. The interface could stand to be much better, of course. A little web page that sends a browser notification and link to the image or live feed would be the obvious next step. Perhaps the lasers are after that.

I know this is something that comes out of the box with commercial offerings such as Ring, but at least my images aren’t being streamed to the local police.

In which I trained a neural net

The entire sum of my machine learning experience thus far is a couple of courses in grad school, in which I wrote a terrible handwriting recognizer and various half-baked natural language parsers. As luck would have it, I was just a couple of years too early for the Deep Learning revolution — at the time support vector machines were all the rage — so I’ve been watching the advancements of the last few years with equal measures idle interest and bewilderment. Thus, when I recently stumbled across the fast.ai MOOC, I couldn’t resist following along.

I have to say I really enjoy the approach of “use the tools first, then learn the theory.” In the two days since I started the course, I already built a couple of classifiers and got very good results, much more easily than with my handwriting recognizer of yore.

My first model was trained on 450 baby pictures of my children and achieved 98% accuracy. Surprisingly, the mistakes did not confirm our priors, that Alex and Sam look most similar as babies — instead it tended to confuse them both with Ian. The CNN predicts that I am 80% Alex. Can’t argue with that math.

The second classifier was trained on Pokemon, Transformers, and Jaegers (giant robots from Pacific Rim). This gets about 90% accuracy; not surprisingly, it has a hard time telling apart the robot classes, but has no trouble picking out the Pokemons.

I’m still looking for a practical application, but all in all, it’s a fun use for a GPU.

New Directions in Commandline Editing

Update: I finally isolated this to gnome’s multiple input method support. Shift + Space is bound by default to switch input sources, and it is in whatever the next input method is that unusual things happen. Turn that stuff off in the keyboard shortcuts! Find below the original post about the visible symptoms.

Dear lazyweb,

Where in {bash, readline, gnome terminal, wayland, kernel input} does the following brain damage, newly landed in debian testing, originate?

  • pressing ctrl-u no longer erases to beginning of line, but instead starts unicode entry with a U+ prompt, much like shift-ctrl-u used to do, except the U is upper-cased now
  • hitting slash twice rapidly underlines the first slash (as if to indicate some kind of special entry prompt, but one I’m unfamiliar with and I cannot get it to do anything useful anyhow), and then just eats the second slash and the underline goes away
  • characters typed quickly (mostly spaces) get dropped

This stuff is so hard to google for and it is killing my touch typing. Email me if you know, and I’ll update this accordingly so that maybe google will know next time.

More filler

I noticed some HTML5 crossword construction apps have sprung up over the last year, so I no longer have first mover status on that. Also their UIs are generally better and I dislike UI work, so it seemed reasonable to join forces. Thus, I sent some PRs around.

It was surprising to me that the general SAT solver used by Phil, written in C compiled to asmjs was so much slower than a pure Javascript purpose-built crossword solver (mine). I assumed the SAT solver might make certain search optimizations that could overcome the minor optimizations related to mine being only a crossword solver. So, yay, my code is not that bad?

In these apps the filler runs as a web worker so even when slow it isn’t too noticable (except for hogging a core).

Anyway you can try out my filler today with Kevin (filler code is here, which is exactly the code in my own app except with some janky whitespace because I stripped out all flow annotations).

Making copies

One of my early goals this year has been to revamp the backup regime at the old homestead. Previously, I have relied on rotating external drives connected to my main desktop, and putting most content on a samba share that also got backed up. But it was a bit ad-hoc and things not on the share only got backed up sporadically. I’d rather not use a cloud backup service because reasons, so I bought a NAS, and now it looks like this:

  • Daily:
    • download cloud assets (google photo)
    • backup all disks to NAS (borg backup on Linux, Windows backup). Incrementals of 7 days / 4 weeks / 6 months
    • backup NAS to external drive (rsync)
  • Weekly:
    • borg check the latest backup
    • swap external disk with one in fire safe
  • Monthly:
    • swap external disk with off-site disk

I think this should work well enough until I use up enough storage to outstrip the individual external disks, then I’ll have to rethink things. Too far, or not too far enough?