Router redux

I had just a little bit of downtime over the last few weeks while Angeline and I reacquainted ourselves with how to take care of an infant. It’s like riding a bicycle: there are tons of things you forgot since last time.

One of the long-on-my-todo-list items finally got completed: I upgraded all of my wifi routers. I have 5: an old dual band 11n router and four TP-Link 11ac units, all of which were running some oldish build of OpenWRT.

I decided the 11n router’s time has come for the trash-heap, so I put the latest stable LEDE build on one of the TP-Links and swapped it out.

The other three are just access points without the router: they just have all the ports on the unit bridged together, connected to the router via a wired switch. I also have a mesh wifi interface up on each unit so that I could place the unit anywhere regardless of wired connectivity (though, in practice, I have wired drops everywhere so I don’t really use this.) For these, I build from source with just the required bits. I added a serial port to one of the units so I can test builds there before rolling out to the other two.

In all it was pretty painless since the LEDE build is more or less the same as OpenWRT. I did go through (LEDE) recovery once and found this fun issue:

root@(none):/tmp# sysupgrade -n lede-ar71xx-generic-archer-c7-v2-squashfs-sysupgrade.bin
Image metadata not found
killall: watchdog: no process killed
Commencing upgrade. All shell sessions will be closed now.
Failed to connect to ubus
root@(none):/tmp#

…because sysupgrade has different paths for failsafe vs not; and for some reason $FAILSAFE is not always set. Do this to work-around:

root@(none):/tmp# export FAILSAFE=1
root@(none):/tmp# sysupgrade -n lede-ar71xx-generic-archer-c7-v2-squashfs-sysupgrade.bin
Image metadata not found
killall: watchdog: no process killed
Commencing upgrade. All shell sessions will be closed now.
root@(none):/tmp# Connection to 192.168.1.1 closed by remote host.

Encrypted mesh PSA

I’m a bit late writing this up, as lately wifi stuff has taken a back seat to the day job. But, I’ve now seen the following issue reported more than once, so hopefully this post will save the other two mesh users some grief.

If you are running an encrypted mesh on wpa_supplicant or (ye gods) authsae, take note:

  • Encrypted mesh on kernel 4.8 will not interoperate with kernel < 4.8
  • wpa_supplicant 2.6 will not interoperate with 2.5
  • authsae as of commit 706a2cf will not interoperate with that of commit 813ec0e [1]
  • wpa_supplicant 2.6 and authsae commit 706a2cf require additional configuration to work with kernel < 4.8 (see below).

What is behind all of that? A while ago I noticed and mentioned to another networking developer that the 802.11 standard calls for certain group-addressed management frames in mesh networks to be encrypted with the mesh group key (MGTK). However, the implementation in the kernel only integrity-protected these frames, using the integrity group key (IGTK) [2]. I don’t happen to know the history here: it may be that this was something that was changed between the draft and the standard, but anyway that was the state until Masashi Honma fixed it, with patch 46f6b06050b that landed in kernel 4.8.

Meanwhile, Jouni Malinen fixed a bunch of related issues in the wpa_supplicant implementation, namely that IGTK was not being generated properly and instead the MGTK was used for that. I made the same kind of fixes for authsae.

Yes, it’s a multiple(!) flag day and we suck for that.

Now, on to how to fix your mesh.

Old mesh daemon + old kernel everywhere should still work.
Old mesh daemon + new kernel everywhere should still work.
New mesh daemon + new kernel everywhere should still work.

New mesh daemon + pre-4.8 kernel: you will most likely see that peering works but no data goes across because HWMP frames are dropped at the peer. The solution here is to enable PMF in the mesh daemon:

  • add “pmf=2” or “ieee80211w=2” to the relevant section of wpa_supplicant.conf
  • add “pmf=1;” to the meshd section of authsae.cfg

If you for some reason find yourself in the position of needing to run a mix of old/new wpa_s and/or old/new kernel, here are my (completely untested) suggestions:

  • patch the kernel to accept PMF-protected frames as well as encrypted frames (this allows older kernels to work with newer ones)
  • patch the new mesh daemon (wpa_supplicant or authsae) to copy the MGTK into the IGTK instead of generating a separate key (this allows older mesh daemon to work with the newer one since older daemon will use MGTK for IGTK).

Both of these will reduce your security and don’t follow the spec, so there’s little chance such changes will go upstream.

[1] Prior to 706a2cf, authsae doesn’t even interoperate with wpa_supplicant due to not filling in all of the elements in peering frames. My advice would be to switch to wpa_supplicant, which has things like a maintainer and a test suite.
[2] On encryption vs integrity protection: the latter is similar to PGP-signing of emails: it makes tampering evident but does not hide the content of the message itself.

On Drawing Squares

I’ve been looking at the sample drawing in speccy to try and speed it up. For a given frame, it renders about 10000 points as 9-pixel anti-aliased squares using Cairo, in Python. On my desktop (Debian stable, Nouveau driver) it gets a pokey 1 FPS.

My graphics knowledge is about 20 years old at this point, and while I did once know various arcane tricks on the hardware of the day, none of that is relevant now. So I know basically nothing about modern graphics. It is still not clear to me to what extent the 2D primitives use acceleration, and it doesn’t help much that, in as many years that I’ve been asleep on this topic, there seems to have been as many new APIs introduced and retired: today there are at least XRender, XAA, EXA, UXA, SNA, and Glamor — did I miss any?

Anyway, here are the results on my desktop from things that I’ve tried so far [1]:

test                    FPS
----                    -------
orig (aa)                0.904780
no_aa                    1.103061
no_cairo                 8.372354
no_cairo_aa              2.408143
batched                 19.873722
batched_aa              29.045358
batched_by_color         7.611148
batched_by_color_aa     14.856336

The tests are:

  • orig: the current code, where each rectangle is filled immediately in turn.
  • no_cairo: I skip Cairo completely and manually render to a GdkPixbuf internally, under the theory that draw function call overhead is the big killer
  • batched: using Cairo with one cr.fill() call for all of the rectangles. This means all are rendered with the same color, and as such could not actually be used. But it is interesting as a data point on batching vs. not.
  • batched_by_color: one cr.fill() call per distinct color
  • aa refers to having antialiasing enabled

Batching appears to make a huge difference, and binning and batching by color is the most performant usable setup. It’s not really difficult to batch by color since the colors are internally just an 8-bit intensity value, though in my quick test I am hashing the string representation of the float triples so it’s not as quick as it could be.

I don’t understand the performance increase on batching with AA applied. Perhaps this is an area where some sort of acceleration is disabled when AA is also disabled? This is a repeatable outcome on this hardware / driver.

Then I tested on my laptop and got these numbers (Debian Testing, Intel graphics):

test                    FPS
----                    -------
orig (aa)               28.554757
no_aa                   27.632071
no_cairo                 9.954314
no_cairo_aa              2.958732
batched                 43.917732
batched_aa              28.750845
batched_by_color        19.933492
batched_by_color_aa     19.803839

Here, just using Cairo is the best. My guess is batching by color is only worse because of the higher CPU cost of the hashing and the rather inefficient way I did it. Given the close performance of batched_aa and the original test though, batching done well probably isn’t going to make much difference.

Finally, I went back to the desktop and tried Nvidia binary drivers (oh, how I don’t miss those).

test                    FPS
----                    -------
orig                     6.586327
no_aa                   15.216683
no_cairo                 9.072748
no_cairo_aa              2.368938
batched                 39.192580
batched_aa              27.717076
batched_by_color        18.681605
batched_by_color_aa     16.450811

It seems Nouveau still has some tricks to learn. In this case AA is, as expected, slower than non-AA. Overall, batching still seems to give a decent speedup on this hardware, so I may look into actually doing that. Moral of the story: use the laptop.

As for speccy, I actually had a use for it recently: I needed to see whether a particular testmode on ath5k was functional. Although I can’t really measure anything precisely with the tool, I can at least see if the spectrum looks close enough. Below is a snapshot from my testing, showing that in this testmode, the device emits power on 2412 MHz with a much narrower bandwidth than normal.

This matches, as far as I can tell, the spectrum shown on the NDA-encumbered datasheet.

[1] Approaches not tried yet: reducing the point count by partial redraws, using something like VBOs in GL, or rewriting parts in C. Feel free to let me know of others.

sae.py

For a while now, I’ve wanted an easy way to decrypt a mesh packet capture when I know the SAE passphrase. This would be quite handy for debugging.

I reasoned that, if I knew the shared secret, and had captured the over-the-air exchange, it should be possible to make such a tool, right? We know everything each station knows in that case, don’t we?

So I spent a bit of time last week reimplementing SAE in python to come to grips with its various steps. And I found my reasoning was flawed.

Similarly to Diffie-Hellman, SAE exchanges values that include the composition of a random value (we’ll call them r1 and r2) with some public part P, in such a way that it is hard to extract r1 even if you know P (e.g. r1*P and r2*P with ECC, although this is not exactly how SAE is specified). These values can be used by each peer to arrive at a shared secret provided they know the original random number (e.g., something like r1 * r2 * P). The crucial point is that r1 and r2 cannot be determined from the exchange alone. They exist only in memory on each peer during the exchange.

So if the secrecy doesn’t actually depend on the password, what is it there for? The answer is authentication: SAE is a password authenticated key exchange (PAKE). The password ensures that the party we are talking to is who we think it is, or at least someone who also knows the password. In the case of SAE, P is derived from the password.

As for the original goal, what we can do instead is use the CCMP keys from a wpa_supplicant log and decrypt frames with those. That is not nearly as convenient, though.

And thus my postings for 2015 are at an end; happy new year, all!

Fake SNRs

With the addition of per-link signal levels that I added over the weekend, my wmediumd fork leveled up from “mere curiosity” to “potentially useful someday.” For the case of mesh, this means you can use signal levels to inform how HWMP will create the mesh paths.

For example, as a test I was able to validate this fix, by setting up a virtual 4-node mesh with a bad path and a good path. With the patch reverted, the bad path was almost always selected due to its PREQs being received at the target first. [In actuality, this test will exhibit frequent path swapping because the order in which the PREQs are received is essentially random, a finding in “Experimental evaluation of two open source solutions for wireless mesh routing at layer two” by Garroppo et al. Wmediumd doesn’t show this yet because frames are mostly received and queued in-order. At the time of the patch, I validated it in an actual 15-node mesh.]

There are still a couple of things that would be nice to have here. Today, we base the decision on whether a multicast frame is received by the signal level from the transmitter to us and the multicast rate. However, this means that with a low multicast rate, there is basically zero frame loss. In real life, loss happens much more frequently, and so we cannot test the effects of lost path request frames in wmediumd, which is the subject of at least one pending HWMP patch. Another problem is that the current setup works only with static setups; we might be interested in what happens with mobile nodes, for example. For that we’d need to be able to change the signal level periodically; how to easily specify that is a bit of a question mark.

tmux + wmediumd

Wmediumd gained the ability to do a simple contention simulation a while ago. It turned out to be a small change to the existing code: just ensure that any new frames are scheduled after any other queued frames of equal or higher priority from any other station.

Assuming the simulation is accurate, we might use this to gather some information about different kinds of wireless network topologies. For example: what is the throughput and latency like for a mesh network, as a function of hops?

The one sticking point is that it’s a bit of a pain to set up a bunch of mesh nodes with hwsim with their own IPs and routing tables. I’ve previously scripted this with send-to-self routing, but it’s a bit ugly. So I looked into doing this with network namespaces and controlling it all with tmux. The result is this fairly minimal script to launch a number of mesh nodes in a linear toplogy. From there one can easily run ping and iperf to gather some data, as in this chart:

This image shows the result, and is in line with measurements that Javier Cardona had done on actual hardware. We can see that throughput is roughly inversely proportional to the number of nodes, while latency is directly proportional.

This may seem pretty bad at first, but makes sense when you consider that a radio transceiver can only listen or talk at once — it is all about radio physics, nothing to do with mesh specifically (which is not to say that mesh has no inefficiencies). Also this level of performance is when all the nodes are in range of each other; in such a case you’d be unlikely to have so many hops because the nodes would instead just peer directly with each other. So we might design our networks to avoid many hops, reduce the number of nodes in a given interference area, use fancy phy algorithms to enhance spatial reuse, or use multiple channels.

My plan with wmediumd is to use it in a bit more automated fashion to evaluate things like changes to HWMP — I think if we can identify topologies that people care about then it’s a bit stronger to say “this change always makes things better” if we can show repeatable before-and-after results from wmediumd.

Weekend updates

More updates to speccy pushed to github this weekend. It gained the ability to toggle on-and-off line and scatter plots. Also, it now takes care of enabling spectral scan and then scanning channels (by exec()-ing iw, naughty me) so you can just tell it the device and it does the rest. Code is still ugly but gradually taking shape. There’s an animated gif for those that want to see what it currently looks like in operation.

I settled on (somewhat inefficiently) smoothing the line graph with a centered moving average using a 5-sample window length, which met the strict theoretical basis of “it looked kind of right.” I did not yet incorporate the time-based plot I was playing with.

It could certainly be faster. Profiling shows that the vast majority of time is spent drawing 9-pixel rectangles in Cairo. I suppose that even if Cairo was using an accelerated backend, which I don’t believe it is in my case, that this would be something more efficiently done by rendering to a client pixmap anyway. Lots of low-hanging fruit everywhere, but I’m resisting the urge to really optimize anything at this early stage.

Nicer serialAlso yesterday I finally cleaned up the somewhat messy serial port installation on one of my ath10k routers with the help of a drill. These cheap cables work well but do require a bit of attention with the glue gun — I find the plastic case around the USB port / converter board is always coming un-snapped and falling off otherwise.

time vs frequency

This is roughly what I had in mind for the time/frequency plot for my little spectrum visualizer. Time is on the y-axis and frequency on the x-axis. There are some interesting artifacts here, especially on the higher channels. Probably bugs in my implementation, but we’ll see as I develop it further.

Speccy with labels

I messed around with “speccy” a bit today to see what I could do with it to make it look slightly nicer. It really needed a grid so one can more easily see the frequencies and power levels. So now it looks like this:

(This is with iperf running on channel 6 to a nearby AP).

I have in mind a couple of other visualizations. An obvious one is to show the overall max or average power level rather than a scatter plot. This would be a cleaner display, although there could be co-channel interference sources you wouldn’t see that way. Here’s a first cut at that:

I am doing some averaging for multiple (~8) samples with the same subcarrier frequency, but clearly more smoothing is needed. Also I notice that I occasionally get some artifact where all the subcarriers have the exact same power level. Not sure what that is, but it wants filtering.

Another interesting visualization would be a scrolling chart of frequency vs time, with power level indicated by pixel intensity. Then you could get an idea of historical changes in the medium. (Oh hey, the neighbors are watching Netflix now.)

The code is still an ugly pile of hacks, which it will probably continue to be until I decide just what kind of processing needs doing on the raw samples. Sorry for that.

VHT mesh

…is a thing now.

# iw dev
phy#0
    Interface wlan0
		ifindex 4
		wdev 0x1
		addr 30:b5:c2:fb:34:d8
		type mesh point
		channel 149 (5745 MHz), width: 80 MHz, center1: 5775 MHz

# iperf -c 192.168.1.20
------------------------------------------------------------
Client connecting to 192.168.1.20, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.21 port 34175 connected with 192.168.1.20 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   148 MBytes   124 Mbits/sec