On board the rebus

My most recent post, for those too lazy to follow the link and hit the “Reveal All” button, is a crossword puzzle announcing the birth of our third son, Samuel Yit-Mun Copeland. We’re over the moon with this new guy in our family. Here he is nearly asleep in my arms as I type this.

But now for something more boring.

I decided months ago that my typical geeky announcement would take the form of a crossword since I’ve been playing in that area recently (previous announcements: a manpage for Alex, an ascii-art crypto program for Ian). And why not make this puzzle interesting by constructing my first rebus puzzle?

A rebus puzzle is one in which a single square can have multiple letters. Part of the fun of solving is realizing at some point that an answer must be right, yet it cannot fit the available cells, so something really weird is going on. Then one curses the constructor and starts with a new set of assumptions.

Constructing such a puzzle is not too much harder than any ordinary puzzle, since all the rebus squares will be part of the themers that the constructor selects up front. Because of this, the fill software doesn’t even really need to care about rebuses — one can use any placeholder for the rebus squares and then edit the final crossword file to add in the full string.

Having lots of theme squares does constrain the grid somewhat, as does having a number of circles on top of that (in this case the circles spell out our new son’s name). But I cheated a little bit: I decided having any old separation between circles was fine as long as the components were all on the same line, and his name is short so that was no major obstacle. In the end I’m pretty happy with the fill considering the 55 theme squares + 20 circles.

Each time I make a puzzle, I also make a number of improvements to my filling software. I’ve long since forgotten whether the goal is building puzzles or building a puzzle builder.

I mentioned before that I had optimized the javascript filler to the point that it was potentially usable in a webapp; this webapp is now a thing. For this puzzle, I used that app as well as my interactive python command-line filler, which has some additional features like estimating the number of eventual solutions using more levels of look-ahead. Finally, I also used a version of the python filler that will take a completed grid and try to optimize it by repeatedly clearing sections of the grid and refilling it randomly.

There is still a good deal of performance work to do on the filler itself. The basic structure is still similar to the backtracking algorithm in QXW: bitmaps are used to filter available words for each cell, and we fill each entry starting with the entries with fewest fills. I made two minor changes in my version: the word list is initially sorted by score and indexed on length so that we can try highest scoring words first and eliminate comparisons against words that will never fit; and entire entries are filled instead of single cells so that score of an entire word is maximized.

I implemented my current algorithm from scratch in C, and it clocks in under half a second for my test puzzle: a little faster than QXW’s implementation and some 30x faster than the very same algorithm in javascript. In terms of LOC, they are roughly the same: 573 (python), 696 (javascript), 763 (C). You can watch them compete below.

So the javascript implementation is frightfully slow. Getting it down to the level of, say, two times the C code, would be a major win, but beyond my current level of js optimization know-how.

Besides that, there is likely some algorithmic optimization that could bring larger gains. Knowing that the “crossword fill problem” is an instance of SAT, I went off to read what Knuth had to say about it (Vol 4 fascicles 5-6), and, as usual, came away with a lot more to think about. The “word square” section is very similar to crossword filling, and indeed the algorithm is close to my existing search, except that it uses trie data structures to cut down on the number of possible words visited at each level. It’s unclear to me at present whether tries would be a win here given that we don’t always start filling with a prefix or suffix, but it could be worth an (ahem) try. Also I may look at branch-and-bound search pruning and parallel search at some point.

ACPT

The American Crossword Puzzle Tournament starts tomorrow. I’ve paid the entrance fee for the online “bragging rights only” track, so I’m looking forward to seeing some interesting puzzles and to get an idea of how my solving skills stack up. Considering that competitive solvers often finish well under five minutes on a Monday and my best is in the ten minute range, probably not that great, but we shall see.

I recently started a couple of the London Times cryptic crossword books; these are hard. As in NYT Saturdays are a cakewalk by comparison. Partly, because I’m not terribly knowledgeable of British geography and slang, partly because I haven’t learned to recognize the various clue forms, and partly because they are just tough. Example clue: [Notice boy going round vehicle after parking]. Answer: PLACARD. Because “PLACARD” is a notice, and in that word “LAD” (boy) is literally going around “CAR” (vehicle), which both come after “P” which you just have to know is an abbreviation of parking. So yeah, tricky.

On placing letters

Work continues on my fork of the QXW crossword filler, which by now is about 50% rewritten by me and ultimately will end up looking nothing like its predecessor. Already it is a command-line, monte carlo automatic filler instead of a user-directed GTK program. Both have their uses, I think.

In a previous post, I wished that QXW supported scored dictionaries. It turns out that it does, though in my tests it gave unexpected results in many cases. I’ve reworked most of that code and am pretty happy with the results I get now. Just to cherry-pick a before-and-after example:

input:

...#...
...#...
.......
##.X.##
.......
...#...
...#...

before:

SBA#STA
BJS#AIX
WASHMEN
##AXA##
ICGARDS
IUE#AWK
PFS#HHM

after:

THE#BHE
FOR#EAT
COUNCIL
##PXA##
LETTUCE
ARE#SEE
DID#EAR

The latter grid looks a lot more like actual words, though it does still have some garbage entries.

QXW’s algorithm could be roughly stated as “find the hardest to fill cell, then fill it with the highest scoring letter, repeat.” It is a naturally recursive algorithm (implemented iteratively with stacks) that either terminates when the grid is filled, or unwinds and tries the next best letter.

I have some issues with this approach, the main one being that maximizing letter score seems unlikely to maximize score of whole words. I’ve been thinking about this problem some, including the form of optimality that I would like to achieve: maximize the sum of the scores of every word.

A simple brute force algorithm implementing this could be “fill each entry with a word, compute the score, and repeat until all words in all entries have been tried, taking the grid with maximum score.” It’s fairly easy to realize this exponential algorithm though you may be waiting a looooooooooong time for it to finish.

What I have now is something like this greedy algorithm:

  1. Base case (no more entries):
    1. if grid is filled, return the grid
    2. return None
  2. Choose the longest unfilled entry crossing a “critical” cell
  3. Fill with the next highest ranked word for that entry
  4. Recurse. Repeat step 2 until grid is filled or no more words

In practice, with a large enough dictionary, this usually terminates without searching forever but you get some junk like PXA. I have added an arbitrary upper limit at which point I just give up the search.

An important thing to note is that the problem does not exhibit optimal substructure, so dynamic programming is not applicable. For example, take this template:

    ..E
    IPA
    PAT

We can fill it any number of ways. My algorithm will fill it like this:

    THE
    IPA
    PAT

…which is terrible (HPA — [Millibar alt.?]). Much better would be:

    SEE
    IPA
    PAT

…which is what you get if you fill the child “EPA” first.

There are a couple of things at work here: first, I pick the longest entry so .IP and .PA don’t even get a chance. Does this make sense? Should I instead pick shortest entry? Entry with fewest completions? Any of these might make better puzzles or at least move the junk around.

Second, THE is really really really common. The commonest three letter word ever. The previous sentence had one and so does this one. That’s a lot of THEs. Also it is a terrible word for crosswords. [Well, looking back through my NYT database, it has been used over a hundred times: {It’s definite} or {Genuine article?} being decent clues.] Unfortunately, THE is so very common and so highly ranked that practically every high scoring puzzle will have one. So, my optimality metric may be sub-optimal in that respect. Flattening the word probability curve somewhat could partially address this issue.

It is clear that we could pick worse words at some levels to get a better overall result. So I implemented a very simple monte carlo search that looks like this:

  1. Run the greedy algorithm, saving decision stack
  2. Pick a random location in the stack, and from there, pick a random word (instead of best word). Re-start algorithm from there.
  3. Repeat this zillions of times, keeping track of best N grids as seeds for future iterations

Lots of variations on the above remain to be explored. Also, this approach opens the door for taking a completed grid, manufacturing a search tree that generated it, and then optimizing it from there. Don’t like the fill in a published puzzle? Let the computer regenerate it for you.

More boring crossword stuff

Since last post, I made two more puzzles, both standard newspaper size, and I appropriated and modified someone else’s javascript so that you can fill it in on that page. Both puzzles went through two drafts where I completely redid the fill after Angeline test-solved them and found some problems.

I’m still using Qxw, now with a 130k+ entry dictionary I built from 10 years of NYT puzzles, which works OK. Still, I feel like it could help even more: for example, it could rank words by their commonality in the corpus so that you don’t use something hyper-obscure that happened to be in one Saturday puzzle when the editor was asleep; or, it could give partial fills even though it may not be able to complete the whole thing; or, it could give hints on places to place black squares to improve chances that a reasonable fill exists.

To get at the latter, I extracted the qxw filler into a command-line program that just fills templates passed to it on stdin, and, into that, piped a python script that generates random valid grids. This helped for the most recent puzzle where a few spots had only two or three words that would fit given the themers, and moving a couple of blocks around was the difference between no-fills and some-fills. A quick improvement there would be to score the resulting grids, and use a genetic algorithm to produce new templates based on their fitness.

Anyway, these various ugly hacks are now on my github.