As a warm up for a research topic I’m doing this semester, I am looking at static search tree performance. Consider the following tree fragment where nodes are numbered according to BFS order:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The nodes are located in memory wherever malloc() put them. An interesting question is whether a different in-memory layout has better search performance. One possible layout is in an array with the following order: 1,2,3,4,8,9,5,10,11,6,12,13,7,14,15 (this pattern is not original to me).
So far, the answer, surprisingly to me, is: not with this layout. With a benchmark of 10 million searches of a randomly-inserted (otherwise unbalanced) tree, I get:
Original Tree: 77s
Reordered Tree: 87s
The tree searches are exactly the same — only the memory layout changed. I can reduce these numbers quite a bit by using -O2 and reducing the size of tree nodes, but there’s still a consistent net loss to the array ordering.
I suspect my cache-emptying routines don’t, but if not, it will be interesting to see where I made the stupid mistake / mental lapse.