Encrypted mesh PSA

I’m a bit late writing this up, as lately wifi stuff has taken a back seat to the day job. But, I’ve now seen the following issue reported more than once, so hopefully this post will save the other two mesh users some grief.

If you are running an encrypted mesh on wpa_supplicant or (ye gods) authsae, take note:

  • Encrypted mesh on kernel 4.8 will not interoperate with kernel < 4.8
  • wpa_supplicant 2.6 will not interoperate with 2.5
  • authsae as of commit 706a2cf will not interoperate with that of commit 813ec0e [1]
  • wpa_supplicant 2.6 and authsae commit 706a2cf require additional configuration to work with kernel < 4.8 (see below).

What is behind all of that? A while ago I noticed and mentioned to another networking developer that the 802.11 standard calls for certain group-addressed management frames in mesh networks to be encrypted with the mesh group key (MGTK). However, the implementation in the kernel only integrity-protected these frames, using the integrity group key (IGTK) [2]. I don’t happen to know the history here: it may be that this was something that was changed between the draft and the standard, but anyway that was the state until Masashi Honma fixed it, with patch 46f6b06050b that landed in kernel 4.8.

Meanwhile, Jouni Malinen fixed a bunch of related issues in the wpa_supplicant implementation, namely that IGTK was not being generated properly and instead the MGTK was used for that. I made the same kind of fixes for authsae.

Yes, it’s a multiple(!) flag day and we suck for that.

Now, on to how to fix your mesh.

Old mesh daemon + old kernel everywhere should still work.
Old mesh daemon + new kernel everywhere should still work.
New mesh daemon + new kernel everywhere should still work.

New mesh daemon + pre-4.8 kernel: you will most likely see that peering works but no data goes across because HWMP frames are dropped at the peer. The solution here is to enable PMF in the mesh daemon:

  • add “pmf=2” or “ieee80211w=2” to the relevant section of wpa_supplicant.conf
  • add “pmf=1;” to the meshd section of authsae.cfg

If you for some reason find yourself in the position of needing to run a mix of old/new wpa_s and/or old/new kernel, here are my (completely untested) suggestions:

  • patch the kernel to accept PMF-protected frames as well as encrypted frames (this allows older kernels to work with newer ones)
  • patch the new mesh daemon (wpa_supplicant or authsae) to copy the MGTK into the IGTK instead of generating a separate key (this allows older mesh daemon to work with the newer one since older daemon will use MGTK for IGTK).

Both of these will reduce your security and don’t follow the spec, so there’s little chance such changes will go upstream.

[1] Prior to 706a2cf, authsae doesn’t even interoperate with wpa_supplicant due to not filling in all of the elements in peering frames. My advice would be to switch to wpa_supplicant, which has things like a maintainer and a test suite.
[2] On encryption vs integrity protection: the latter is similar to PGP-signing of emails: it makes tampering evident but does not hide the content of the message itself.

memory µ-optimizing

While looking at some kmalloc statistics for my little corner of wireless, I found a new tool for the toolbox: perf kmem. To test it out, I ran a simulated connected 25-node mesh in mac80211_hwsim.

Here’s what perf kmem stat --caller had to say:

---------------------------------------------------------------------
 Callsite            | Total_alloc/Per | Total_req/Per  | Hit   | ...
---------------------------------------------------------------------
 mesh_table_alloc+3e |      1600/32    |       400/8    |    50 |
 mesh_rmc_init+23    |    204800/8192  |    102600/4104 |    25 |
 mesh_path_new.const |    315904/512   |    241864/392  |   617 |
[...]

The number of mesh nodes shows up in the ‘Hit’ column in various ways: we can pretty easily see that mesh_table_alloc is called twice per mesh device, mesh_rmc_init is called once, and mesh_path_new is called at least once for every peer (617 ~= 25 * 24). We might scan the hit column to see if there are any cases in our code that perform unexpectedly high numbers of allocations, signifying a bug or poor design.

The first two columns are also interesting because they show potentially wasteful allocations. Taking mesh_table_alloc as an example:

(gdb) l *mesh_table_alloc+0x3e
0x749fe is in mesh_table_alloc (net/mac80211/mesh_pathtbl.c:66).
61		newtbl = kmalloc(sizeof(struct mesh_table), GFP_ATOMIC);
62		if (!newtbl)
63			return NULL;
64
65		newtbl->known_gates = kzalloc(sizeof(struct hlist_head), GFP_ATOMIC);
66		if (!newtbl->known_gates) {
67			kfree(newtbl);
68			return NULL;
69		}
70		INIT_HLIST_HEAD(newtbl->known_gates);

We are allocating the size of an hlist_head, which is one pointer. This is kind of unusual: typically the list pointer is embedded directly into the structure that holds it (here, struct mesh_table). The reason this was originally done is that the table pointer itself could be switched out using RCU, and a level of indirection for the gates list ensured that the same head pointer is seen by iterators with both the old and new table. With my rhashtable rework, this indirection is no longer necessary.

To track allocations with 8-byte granularity would impose an unacceptable level of overhead, so these 8-byte allocations all get rounded up to 32 bytes (first column). Because we allocated 50 of them (25 devices, two tables each), in all we asked for 400 bytes, and actually got 1600, a waste of 1200 bytes. Forty-eight wasted bytes per device is generally nothing worth worrying about, considering we usually only have one or two devices — but as this kmalloc is no longer needed, we can just drop it and also shed a bit of error-handling code.

In the case of mesh_rmc_init, we see an opportunity for tuning. The requested allocation is the unfortunate size of 4104 bytes, just a bit larger than the page size, 4096. Allocations of a page (so-called order-0 allocations) are much easier to fulfill than the order-1 allocations that actually happen here. There are a few options here: we could remove or move the u32 in this struct that pushes it over the page size, or reduce the number of buckets in the hashtable, or switch the internal lists to hlists in order to get this to fit into 4k, and thus free up a page per device. The latter is the simplest approach, and the one I chose.

I have patches for these and a few other minor things in my local tree and will send them out soon.

sae.py

For a while now, I’ve wanted an easy way to decrypt a mesh packet capture when I know the SAE passphrase. This would be quite handy for debugging.

I reasoned that, if I knew the shared secret, and had captured the over-the-air exchange, it should be possible to make such a tool, right? We know everything each station knows in that case, don’t we?

So I spent a bit of time last week reimplementing SAE in python to come to grips with its various steps. And I found my reasoning was flawed.

Similarly to Diffie-Hellman, SAE exchanges values that include the composition of a random value (we’ll call them r1 and r2) with some public part P, in such a way that it is hard to extract r1 even if you know P (e.g. r1*P and r2*P with ECC, although this is not exactly how SAE is specified). These values can be used by each peer to arrive at a shared secret provided they know the original random number (e.g., something like r1 * r2 * P). The crucial point is that r1 and r2 cannot be determined from the exchange alone. They exist only in memory on each peer during the exchange.

So if the secrecy doesn’t actually depend on the password, what is it there for? The answer is authentication: SAE is a password authenticated key exchange (PAKE). The password ensures that the party we are talking to is who we think it is, or at least someone who also knows the password. In the case of SAE, P is derived from the password.

As for the original goal, what we can do instead is use the CCMP keys from a wpa_supplicant log and decrypt frames with those. That is not nearly as convenient, though.

And thus my postings for 2015 are at an end; happy new year, all!

VHT mesh

…is a thing now.

# iw dev
phy#0
    Interface wlan0
		ifindex 4
		wdev 0x1
		addr 30:b5:c2:fb:34:d8
		type mesh point
		channel 149 (5745 MHz), width: 80 MHz, center1: 5775 MHz

# iperf -c 192.168.1.20
------------------------------------------------------------
Client connecting to 192.168.1.20, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.21 port 34175 connected with 192.168.1.20 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   148 MBytes   124 Mbits/sec

mesh on ath10k

I’ve had my ath10k AP (TP-Link Archer C7 v2) since last October or so, with the goal of having a VHT-capable device with which to test (currently non-existent) VHT mesh. Unfortunately, for nearly all of that time, I’ve been stuck on a firmware crash shortly after bringing up the device. Not that I’ve spent a whole lot of time on it, but there’s only so much one can do when getting to the point of “firmware crashes and it has something to do with peers but that’s all I know and I don’t have the time, tools, or code to dig deeper.”

I think there’s some variant on rubber duck debugging where you complain publicly about some issue, and doing that makes you think about it more, and then the way forward is magically revealed. That, plus some helpful hints from the residents of the ath10k ML, got me over the hump, so now it works!

Next up, finding a spare mini PCIe slot for the other ath10k device I have, and getting the VHT bits done…