State of the pizza

It has been a year since my last statement about home-made pizzas, so it may be a time to serve up (deliver?) an update. Here’s one from the most recent batch:

Ken Forkish’s straight-dough, low yeast, long-rise recipe continues to be my go-to, and there is little left to complain about. It has a great crumb and nicely developed taste, especially when the dough is a day or two old. I still do need some work on the technique side: getting an unbaked pizza off the peel — without using so much flour or cornmeal as to mar the taste — still eludes me. Thus, I have settled on using parchment paper for the first couple of minutes, then, after the crust firms up, lifting the pizza with tongs and yanking out the parchment paper before it incinerates in the 550 degree oven. This, it should be said, is a maneuver not without danger to one’s fingers.

Our grocery carries “pizza yeast” in addition to the normal stuff, so I decided to give this a try to see how it stacks up. Apparently, the packet consists of highly active yeast along with some conditioners that make a thirty minute dough handle like one with a longer rise. The resulting dough was as easily stretchable as advertised, but also had little to no strength. It was altogether too easy to tear a hole while trying to shape it into a round, but I persevered. Here is the side-by-side (pizza yeast on the right):

The resulting crust was rubbery and tasteless, so my advice is to not bother repeating this experiment.

I am considering branching out into a wild-yeast version next, depending on whether I can manage to get a viable starter going.

React reaction

An embarrassing admission to make: I wrote something in javascript.[1]

Actually, I wrote it in React, because I still think of javascript as that awful language that works differently in every browser, and so, you know, to stay relevant, I decided to learn the framework that web developers everywhere have already cast into yesterday’s wastebin for the new shiny. This, pretty much.[2]

Anyway, I replaced the third-party crossword app on my website with my own NIH version and made it somewhat responsive [3] so that it works on my phone. Also there are two new-ish puzzles since the last time I posted, dated 2016-11-25 and 2016-12-30. Each of them has its own set of problems, but I think I am at least learning where I need to improve.

To this neophyte, my impression is that using React (and ES6) is much nicer than that terrible language that they compile down to. On the other hand, the pattern of moving state up the object hierarchy away from an object’s view feels backwards and not very OO-like. Perhaps if I also used Flux then I might get enlightened on this point, but that is for another day.

[1] Here’s a nickel, kid. Buy yourself a real language.
[2] The day that my relevance depends on knowing the latest JS framework is the day that I will take up a completely different career.
[3] Anything worth doing is worth doing somewhat.

On placing letters

Work continues on my fork of the QXW crossword filler, which by now is about 50% rewritten by me and ultimately will end up looking nothing like its predecessor. Already it is a command-line, monte carlo automatic filler instead of a user-directed GTK program. Both have their uses, I think.

In a previous post, I wished that QXW supported scored dictionaries. It turns out that it does, though in my tests it gave unexpected results in many cases. I’ve reworked most of that code and am pretty happy with the results I get now. Just to cherry-pick a before-and-after example:

input:

...#...
...#...
.......
##.X.##
.......
...#...
...#...

before:

SBA#STA
BJS#AIX
WASHMEN
##AXA##
ICGARDS
IUE#AWK
PFS#HHM

after:

THE#BHE
FOR#EAT
COUNCIL
##PXA##
LETTUCE
ARE#SEE
DID#EAR

The latter grid looks a lot more like actual words, though it does still have some garbage entries.

QXW’s algorithm could be roughly stated as “find the hardest to fill cell, then fill it with the highest scoring letter, repeat.” It is a naturally recursive algorithm (implemented iteratively with stacks) that either terminates when the grid is filled, or unwinds and tries the next best letter.

I have some issues with this approach, the main one being that maximizing letter score seems unlikely to maximize score of whole words. I’ve been thinking about this problem some, including the form of optimality that I would like to achieve: maximize the sum of the scores of every word.

A simple brute force algorithm implementing this could be “fill each entry with a word, compute the score, and repeat until all words in all entries have been tried, taking the grid with maximum score.” It’s fairly easy to realize this exponential algorithm though you may be waiting a looooooooooong time for it to finish.

What I have now is something like this greedy algorithm:

  1. Base case (no more entries):
    1. if grid is filled, return the grid
    2. return None
  2. Choose the longest unfilled entry crossing a “critical” cell
  3. Fill with the next highest ranked word for that entry
  4. Recurse. Repeat step 2 until grid is filled or no more words

In practice, with a large enough dictionary, this usually terminates without searching forever but you get some junk like PXA. I have added an arbitrary upper limit at which point I just give up the search.

An important thing to note is that the problem does not exhibit optimal substructure, so dynamic programming is not applicable. For example, take this template:

    ..E
    IPA
    PAT

We can fill it any number of ways. My algorithm will fill it like this:

    THE
    IPA
    PAT

…which is terrible (HPA — [Millibar alt.?]). Much better would be:

    SEE
    IPA
    PAT

…which is what you get if you fill the child “EPA” first.

There are a couple of things at work here: first, I pick the longest entry so .IP and .PA don’t even get a chance. Does this make sense? Should I instead pick shortest entry? Entry with fewest completions? Any of these might make better puzzles or at least move the junk around.

Second, THE is really really really common. The commonest three letter word ever. The previous sentence had one and so does this one. That’s a lot of THEs. Also it is a terrible word for crosswords. [Well, looking back through my NYT database, it has been used over a hundred times: {It’s definite} or {Genuine article?} being decent clues.] Unfortunately, THE is so very common and so highly ranked that practically every high scoring puzzle will have one. So, my optimality metric may be sub-optimal in that respect. Flattening the word probability curve somewhat could partially address this issue.

It is clear that we could pick worse words at some levels to get a better overall result. So I implemented a very simple monte carlo search that looks like this:

  1. Run the greedy algorithm, saving decision stack
  2. Pick a random location in the stack, and from there, pick a random word (instead of best word). Re-start algorithm from there.
  3. Repeat this zillions of times, keeping track of best N grids as seeds for future iterations

Lots of variations on the above remain to be explored. Also, this approach opens the door for taking a completed grid, manufacturing a search tree that generated it, and then optimizing it from there. Don’t like the fill in a published puzzle? Let the computer regenerate it for you.

Encrypted mesh PSA

I’m a bit late writing this up, as lately wifi stuff has taken a back seat to the day job. But, I’ve now seen the following issue reported more than once, so hopefully this post will save the other two mesh users some grief.

If you are running an encrypted mesh on wpa_supplicant or (ye gods) authsae, take note:

  • Encrypted mesh on kernel 4.8 will not interoperate with kernel < 4.8
  • wpa_supplicant 2.6 will not interoperate with 2.5
  • authsae as of commit 706a2cf will not interoperate with that of commit 813ec0e [1]
  • wpa_supplicant 2.6 and authsae commit 706a2cf require additional configuration to work with kernel < 4.8 (see below).

What is behind all of that? A while ago I noticed and mentioned to another networking developer that the 802.11 standard calls for certain group-addressed management frames in mesh networks to be encrypted with the mesh group key (MGTK). However, the implementation in the kernel only integrity-protected these frames, using the integrity group key (IGTK) [2]. I don’t happen to know the history here: it may be that this was something that was changed between the draft and the standard, but anyway that was the state until Masashi Honma fixed it, with patch 46f6b06050b that landed in kernel 4.8.

Meanwhile, Jouni Malinen fixed a bunch of related issues in the wpa_supplicant implementation, namely that IGTK was not being generated properly and instead the MGTK was used for that. I made the same kind of fixes for authsae.

Yes, it’s a multiple(!) flag day and we suck for that.

Now, on to how to fix your mesh.

Old mesh daemon + old kernel everywhere should still work.
Old mesh daemon + new kernel everywhere should still work.
New mesh daemon + new kernel everywhere should still work.

New mesh daemon + pre-4.8 kernel: you will most likely see that peering works but no data goes across because HWMP frames are dropped at the peer. The solution here is to enable PMF in the mesh daemon:

  • add “pmf=2” or “ieee80211w=2” to the relevant section of wpa_supplicant.conf
  • add “pmf=1;” to the meshd section of authsae.cfg

If you for some reason find yourself in the position of needing to run a mix of old/new wpa_s and/or old/new kernel, here are my (completely untested) suggestions:

  • patch the kernel to accept PMF-protected frames as well as encrypted frames (this allows older kernels to work with newer ones)
  • patch the new mesh daemon (wpa_supplicant or authsae) to copy the MGTK into the IGTK instead of generating a separate key (this allows older mesh daemon to work with the newer one since older daemon will use MGTK for IGTK).

Both of these will reduce your security and don’t follow the spec, so there’s little chance such changes will go upstream.

[1] Prior to 706a2cf, authsae doesn’t even interoperate with wpa_supplicant due to not filling in all of the elements in peering frames. My advice would be to switch to wpa_supplicant, which has things like a maintainer and a test suite.
[2] On encryption vs integrity protection: the latter is similar to PGP-signing of emails: it makes tampering evident but does not hide the content of the message itself.

More boring crossword stuff

Since last post, I made two more puzzles, both standard newspaper size, and I appropriated and modified someone else’s javascript so that you can fill it in on that page. Both puzzles went through two drafts where I completely redid the fill after Angeline test-solved them and found some problems.

I’m still using Qxw, now with a 130k+ entry dictionary I built from 10 years of NYT puzzles, which works OK. Still, I feel like it could help even more: for example, it could rank words by their commonality in the corpus so that you don’t use something hyper-obscure that happened to be in one Saturday puzzle when the editor was asleep; or, it could give partial fills even though it may not be able to complete the whole thing; or, it could give hints on places to place black squares to improve chances that a reasonable fill exists.

To get at the latter, I extracted the qxw filler into a command-line program that just fills templates passed to it on stdin, and, into that, piped a python script that generates random valid grids. This helped for the most recent puzzle where a few spots had only two or three words that would fit given the themers, and moving a couple of blocks around was the difference between no-fills and some-fills. A quick improvement there would be to score the resulting grids, and use a genetic algorithm to produce new templates based on their fitness.

Anyway, these various ugly hacks are now on my github.

Another puzzle

As I wrote before, I recently tried my hand with crossword construction.

This new puzzle is my sophomore effort. This one, slightly bigger at 13×13, came together much more easily for some reason (luck?). This time I did not have resort to using any unknown words, and nothing is really obscure, with the possible exception of 17-across (a common crossword answer).

As before, I mostly let the theme clues dictate the layout, while I left in a few longer stretches this time. The top left and bottom right corners were sufficiently isolated that I did them by hand before bed, then the next day I filled the longer answers and finished the bottom within about 30 minutes. At this point 3/4 of the puzzle was filled manually. Then, Qxw woke up and was able to help fill the rest (including 18-across, a nice gift which I happily accepted) — still using the standard Linux dictionary file. I reworked some of its suggestions using some of my own, but it was quite a time-saver.

In all I’m happier with this puzzle than the previous one: there’s more wordplay, better theme, nothing too difficult (I think). Comments welcome.

I guess this means the next one will be full-size.

Puzzle

I will admit to being an insufferable word nerd: the kind of person who will intentionally make a pun and then ruin it by acknowledgment. My wife is the sufferable sort who, presented with the same joke, groans outwardly and inwardly at the same time. Together, we have a shared appreciation for the crossword puzzle, and can be found on the porch filling out a grid on the rare occasion that a lazy summer afternoon presents itself. We can usually get through a Thursday or Sunday NYT puzzle, but we aren’t going to be entering any speed solving contests.

A few months ago, I heard about Qxw, a crossword construction software for Linux. As a puzzle solver, the art of creating the puzzle seemed mysterious and left to those far smarter than myself, but any old dumb person can run some software. I finally tried it out yesterday, and here’s the result.

This is an 11×11 puzzle; newspaper puzzles are usually 15×15 but this was my first try so wanted to keep it easy-ish and constructable in a few hours. For the uninitiated, there are a few rules that help guide one into picking a layout: generally there are a few (~3) longish answers that form a theme; the layout is diagonally symmetric; the smallest answers are 3 characters. Taken together, this means two of the theme answers are usually 3-4 rows from the top and bottom, of the same length and in the same relative location, while the third theme answer is in the middle. Additional black boxes are sprinkled about to break up extra-long words.

It turns out that Qxw didn’t actually help much besides automatically placing the mirrored black squares: once I filled in my theme answers and blacked off part of the grid, Qxw couldn’t fill any other spaces. I suspect expanding the dictionary beyond /usr/share/dict/words would help a lot in this respect. Instead I occasionally used egrep on the dictionary file with suitable regexes to find candidate words and resorted to some abbreviations and prefixes. I did cheat a few times by googling some combination of letters that I hoped would be a thing (1-down, 6-down, 35-down). I need to improve my 3-letter word vocab.

Things I did do that annoy me as a solver:

  • Fairly obscure names or characters (35-down, at least to me, a non-Trekkie)
  • Use of old stand-bys (10-, 14-across, I burned these quickly!)
  • Weak multi-word answers (9-down)
  • Foreign words (1-down)

Despite the flaws, Angeline solved it in about 10 minutes while watching TV; she also supplied a much better clue for 25-across which I have appropriated.

I’d say this effort gets a B-, but it was an interesting exercise to do the puzzle from the other side. 13×13 next.

Fake clicky

In the previous installment I had opined about the untapped power inside my new mechanical keyboard. I haven’t had much time to continue looking into the firmware, but I did find the original source of the firmware tools for a similar product. His blog post is well worth a read.

Thus far the new device is performing admirably, but it did come with an unforeseen problem.

I always have a bunch of computers here at my desk, and so for some time I’ve had a physical KVM switch to share a single monitor and input devices among them. This switch has a special USB port on the back just for the keyboard, which it monitors for the double-scroll-lock macro command to switch the display. Unfortunately, said port will not power the new keyboard and also won’t work with a hub.

So, this happened:

Sad 2-keyboard desktop

One for switching, one for typing.

Inspired a bit by Johannes Berg’s (more interesting) uSynergy HID project, I thought I’d see if I could bang up a “keyboard” that just lets me put the minimal bits on the wire to make the KVM happy. I had a Dragonboard (Arm64 SBC) already sitting around, and it supports OTG out of the box. Linaro publishes nightly Debian snapshots for it so getting the device up and running with Linux userland is painless.

The only real work involved is to build a new kernel with the USB HID function drivers. Pretty standard fare except the 96boards wiki is somewhat confusing as to where to get all the pieces; here are my unedited notes:

Initrd: http://builds.96boards.org/snapshots/dragonboard410c/linaro/debian/latest/initrd.img-*
Toolchain: http://releases.linaro.org/14.11/components/toolchain/binaries/aarch64-linux-gnu/gcc-linaro-4.9-2014.11-x86_64_aarch64-linux-gnu.tar.xz
skales: git clone git://codeaurora.org/quic/kernel/skales
Linux repo: https://github.com/rsalveti/linux.git
Branch: qcomlt-4.4
defconfig: distro.config

export ARCH=arm64
export CROSS_COMPILE=~/ext/dragonboard/gcc-linaro-4.9-2014.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-
make defconfig distro.config
make -j8 Image dtbs
make modules modules_install INSTALL_MOD_PATH=../dragonboard

../skales/dtbTool -o dt.img -s 2048 arch/arm64/boot/dts/qcom/
../skales/mkbootimg --kernel arch/arm64/boot/Image \
  --ramdisk initrd.img \
  --output boot-db410c.img \
  --dt dt.img \
  --pagesize 2048 \
  --base 0x80000000 \
  --cmdline "root=/dev/disk/by-partlabel/rootfs rw rootwait console=ttyMSM0,115200n8"

* rsync modules to rootfs
* fastboot to boot or flash the kernel

With the kernel built and booted, I copied this setup mostly verbatim to get a functioning HID device.

And then the only “code” needed is a little shell script to send my switching sequence:

#!/bin/bash
function send_keycode {
	echo "keycode: $1"
	local keycode=$1
	echo -ne "\x00\x00"$keycode"\x00\x00\x00\x00\x00" > /dev/hidg0
	echo -ne "\x00\x00\x00\x00\x00\x00\x00\x00" > /dev/hidg0
	sleep .1
}

function num_keycode {
	local num=$1
	local ord=$(printf "%d" \'$1)
	let kval=$(($ord - 49 + 30))
	char=\\x$(printf "%x" $kval)
	echo $char
}

cd $(basename $(dirname $0))
visible=$(cat .switch.cur || echo "1")
case $1 in
	--up) screen=$(( (visible + 1) % 4)) ;;
	--down) screen=$(( (visible + 4 - 1) % 4)) ;;
	[1234]) screen=$(( $1 - 1 ));;
	*) echo "Usage: $0 [--up|--down|[1-4]]"; exit 1 ;;
esac

echo Switching to screen $(( screen + 1 ))
echo $screen > .switch.cur

# send: 2 x scroll lock, [1-4], enter
send_keycode "\x47"
send_keycode "\x47"
send_keycode $(num_keycode $((screen + 1)) )
send_keycode "\x28"

The volume buttons on the Dragonboard still work when the device is in OTG mode, so I used xbindkeys to bind volume up to “switch.sh –up” and likewise for down.

Happy 2-keyboard desktop

Much better.

Yes, this is pretty ridiculously overpowered for what it is, and I’ll probably replace this with an MCU build when I get some time, but still it is fun to think about having a keyboard that runs X.

Clicky

I got a new keyboard this week with 80s-era clickiness: a Cooler Master QFR XTi.

This thing has LED-backlit keys, which is kind of fun but also a bit of a gimmick. More interesting to me is that driving everything is an ARM Cortex M3 MCU (Holtek HT32F165x) which is flashable over USB. That is more powerful than my first computer.

It would be nice to expand the features of this thing a bit, e.g. there’s an RTC on board the MCU, wouldn’t it be useful to have TOTP built directly into the keyboard? Or, perhaps it would be useful to audit the vendor firmware to make certain it is not recording your keystrokes. Of course, getting to the point of unraveling the vendor FW and flashing one’s own will entail a fair amount of work.

It seems there is support for SWD programming and pads for that on the motherboard, but given my clunkiness with a soldering iron that is kind of a last-resort thing to revisit once I brick my keyboard.

Anyway, I captured the firmware upload via USBPcap in Windows. Turns out it’s a fairly simple protocol. I saw the following two messages cross the wire:

MSG 1:

01 01 53 b6 38 2d 00 00 6b 2d 00 00 73 2d 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 73 2d 00 00 07 48 80 47 07 48 80 47
07 48 00 47 fe e7 fe e7 fe e7 fe e7 fe e7 fe e7

MSG 2:

01 01 84 0d 6c 2d 00 00 9f 2d 00 00 fe e7 fe e7
fe e7 fe e7 ad 7b 00 00 4d 71 00 00 81 2d 00 00
df f8 0c d0 00 f0 3e f8 00 48 00 47 e5 84 00 00
60 47 00 20 06 49 07 4a 08 68 50 43 43 f2 39 02

It is reasonable to assume that the “fe e7” chain repeats without a break across the two messages, which in turn tells us that the header for the message is twelve bytes:

    msg 1: 01 01 53 b6 38 2d 00 00 6b 2d 00 00
    msg 2: 01 01 84 0d 6c 2d 00 00 9f 2d 00 00

Bytes 4-8 and 8-12 look like addresses in little endian format. As it happens, if you subtract these two addresses and add one, you get the value 52, which is exactly how many post-header bytes are in the message. Also note that the “address 1” in message two is just one more than the “address 2” in message one. So we have something that looks like this:

    struct hdr {
        u8 cmd;                         /* ??? */
        u8 subcmd;                      /* ??? */
        __le16 something;               /* ??? -- crc? */
        __le32 start_dest_address;      /* first byte of xfer */
        __le32 end_dest_address;        /* last byte (inclusive) */
        u8 data[0];
    };

The device presumably copies the payload from “data” into the physical (flash) addresses specified in the header.

So I dumped the firmware from the pcap file. There’s nothing like an ELF header on it, but I guess that’s probably normal for bare-metal programs running on an MCU. Various bits do look like ARM thumb code.

Until now I didn’t really know squat about ARM MCUs, but a little bit of googling indicates that E7 FE is an infinite loop and the typical way you would write a handler for something like NMI that you just want to go into loop mode. So that explains the 0xfee7 sequence, and everything before looks like a lot of little-endian 32 bit addresses, which lines up with what would go in the Vector Table, an ARM concept which contains the values for program counter load address, stack pointer, IRQ handlers, and so on.

A bit more googling found some github projects with functional flashing utilities for similar products with similar or the same flashing protocol. There is yet hope, hopefully more to come.

On Drawing Squares

I’ve been looking at the sample drawing in speccy to try and speed it up. For a given frame, it renders about 10000 points as 9-pixel anti-aliased squares using Cairo, in Python. On my desktop (Debian stable, Nouveau driver) it gets a pokey 1 FPS.

My graphics knowledge is about 20 years old at this point, and while I did once know various arcane tricks on the hardware of the day, none of that is relevant now. So I know basically nothing about modern graphics. It is still not clear to me to what extent the 2D primitives use acceleration, and it doesn’t help much that, in as many years that I’ve been asleep on this topic, there seems to have been as many new APIs introduced and retired: today there are at least XRender, XAA, EXA, UXA, SNA, and Glamor — did I miss any?

Anyway, here are the results on my desktop from things that I’ve tried so far [1]:

test                    FPS
----                    -------
orig (aa)                0.904780
no_aa                    1.103061
no_cairo                 8.372354
no_cairo_aa              2.408143
batched                 19.873722
batched_aa              29.045358
batched_by_color         7.611148
batched_by_color_aa     14.856336

The tests are:

  • orig: the current code, where each rectangle is filled immediately in turn.
  • no_cairo: I skip Cairo completely and manually render to a GdkPixbuf internally, under the theory that draw function call overhead is the big killer
  • batched: using Cairo with one cr.fill() call for all of the rectangles. This means all are rendered with the same color, and as such could not actually be used. But it is interesting as a data point on batching vs. not.
  • batched_by_color: one cr.fill() call per distinct color
  • aa refers to having antialiasing enabled

Batching appears to make a huge difference, and binning and batching by color is the most performant usable setup. It’s not really difficult to batch by color since the colors are internally just an 8-bit intensity value, though in my quick test I am hashing the string representation of the float triples so it’s not as quick as it could be.

I don’t understand the performance increase on batching with AA applied. Perhaps this is an area where some sort of acceleration is disabled when AA is also disabled? This is a repeatable outcome on this hardware / driver.

Then I tested on my laptop and got these numbers (Debian Testing, Intel graphics):

test                    FPS
----                    -------
orig (aa)               28.554757
no_aa                   27.632071
no_cairo                 9.954314
no_cairo_aa              2.958732
batched                 43.917732
batched_aa              28.750845
batched_by_color        19.933492
batched_by_color_aa     19.803839

Here, just using Cairo is the best. My guess is batching by color is only worse because of the higher CPU cost of the hashing and the rather inefficient way I did it. Given the close performance of batched_aa and the original test though, batching done well probably isn’t going to make much difference.

Finally, I went back to the desktop and tried Nvidia binary drivers (oh, how I don’t miss those).

test                    FPS
----                    -------
orig                     6.586327
no_aa                   15.216683
no_cairo                 9.072748
no_cairo_aa              2.368938
batched                 39.192580
batched_aa              27.717076
batched_by_color        18.681605
batched_by_color_aa     16.450811

It seems Nouveau still has some tricks to learn. In this case AA is, as expected, slower than non-AA. Overall, batching still seems to give a decent speedup on this hardware, so I may look into actually doing that. Moral of the story: use the laptop.

As for speccy, I actually had a use for it recently: I needed to see whether a particular testmode on ath5k was functional. Although I can’t really measure anything precisely with the tool, I can at least see if the spectrum looks close enough. Below is a snapshot from my testing, showing that in this testmode, the device emits power on 2412 MHz with a much narrower bandwidth than normal.

This matches, as far as I can tell, the spectrum shown on the NDA-encumbered datasheet.

[1] Approaches not tried yet: reducing the point count by partial redraws, using something like VBOs in GL, or rewriting parts in C. Feel free to let me know of others.