Virtual-Memory PowerSort
A key downside of mergesort is the extra space usage. In theory, inplace merging methods are known, but they are rather slow in practice, primarily due to many extra moves.
In this paper, we show that using appropriate sized chunks of the input array in the style of virtual-memory pages has very little overhead, but reduces the extra space usage to $O(\sqrt{n\log n})$. The trick is to keep pages permuted for the results of intermediate merges, and only move them to their final position at the end of the algorithm.
Moreover, we show that a Pingpong-buffer variant of Powersort can avoid almost all extraneous moves; this needs a closer look at the merge choreography of Powersort, which might be of independent interest for other implementations.