A tale of two search trees

As a warm up for a research topic I’m doing this semester, I am looking at static search tree performance. Consider the following tree fragment where nodes are numbered according to BFS order:

            1
      2            3
   4     5      6      7
  8 9  10 11  12 13  14 15

The nodes are located in memory wherever malloc() put them. An interesting question is whether a different in-memory layout has better search performance. One possible layout is in an array with the following order: 1,2,3,4,8,9,5,10,11,6,12,13,7,14,15 (this pattern is not original to me).

So far, the answer, surprisingly to me, is: not with this layout. With a benchmark of 10 million searches of a randomly-inserted (otherwise unbalanced) tree, I get:

Original Tree: 77s
Reordered Tree: 87s

The tree searches are exactly the same — only the memory layout changed. I can reduce these numbers quite a bit by using -O2 and reducing the size of tree nodes, but there’s still a consistent net loss to the array ordering.

I suspect my cache-emptying routines don’t, but if not, it will be interesting to see where I made the stupid mistake / mental lapse.