I was poking around with the disassembler the other day, which is never a good idea. One of the things I looked at was the following bit in the RX hotpath for ath5k:
rxs->qual = rs.rs_rssi * 100 / 35;
Now, that’s not a very significant calculation, and I doubt it will show up in profiles, but divides automatically trigger my “can we do better?” reflex. I was interested to see how gcc compiled this, because we have division by a constant, but we first have to multiply by a variable due to order of operations.
We can generally remove a division by multiplying by its reciprocal. Interestingly, gcc already does that. I couldn’t quite puzzle out all of the steps, but here’s the disassembly with my best guess:
; multiply eax by 100, store in ecx 80483d0: 6b c8 64 imul $0x64,%eax,%ecx ; load (32 / 35) * 2**32 into edx 80483d3: ba eb a0 0e ea mov $0xea0ea0eb,%edx ; multiply 100 * argc by 32/35 and store in edx:eax 80483d8: 89 c8 mov %ecx,%eax 80483da: f7 ea imul %edx ; take top 32 bits of result in edx (when sign extended, the ; complement of the final answer?) and add it back to the numerator 80483dc: 8d 04 0a lea (%edx,%ecx,1),%eax 80483df: 89 c2 mov %eax,%edx ; divide by 32 to remove pre-multiply 80483e1: c1 fa 05 sar $0x5,%edx ; subtract one if we need to round 80483e4: 89 c8 mov %ecx,%eax 80483e6: c1 f8 1f sar $0x1f,%eax 80483e9: 29 c2 sub %eax,%edx
So then I went hunting for the constant folding code in gcc, and there are all kinds of tricks like this. Very neat. Along the way I also found a link to the book Hacker’s Delight, now wish-listed.
In the original code, the denominator is a bit arbitrary, we could pick a different number that is more amenable to shifts and adds and save a multiply, but it’s hardly worth it.