Multiway Powersort
This paper describes Multiway Powersort, an extension of our adaptive mergesort variant for merging several runs at once while still adapting nearly optimally to existing runs. Our C++ implementation of 4-way Powersort (GitHub) achieved a 15–20% speedup over the standard 2-way Powersort (see Fig 3–6).
Since September 2021, Powersort is used in CPython and PyPy instead of Timsort’s suboptimal merge policy; see our ESA 2018 paper for details and background. In scenarios where comparisons are rather expensive (in particular when sorting pointers as in implementations of Python or Java), Timsort/Powersort offers a clear advantage for many inputs and comes with no significant detriment to the worst case.
When comparisons are cheap, memory transfers become a major influence, and multiway merging can subsantially reduce these. This effect has been demonstrated repeatedly for Multiway Quicksort. For adaptive Mergesort, a generalization to multiway merging is not obvious, but for Powersort, it can be achieved (§3), resulting in significant speedups.
This work originated in an Masters dissertation project of William Cawley Gelling.
Powersort: Clean explanation and self-contained analysis
The effect of explaining (standard 2-way) Powersort to loads of people was that I found much clearer ways to introduce it compared to the original description. §3 of this paper gives (in my biased opinion) the best introduction to the algorithm (multiway or not).
Moreover, §3.3 gives a completely elementary and self-contained proof of the optimality of Powersort’s merge cost.
Relation to Other Papers
The algorithms of this paper generalize Peeksort and Powersort from the ESA 2018 with Ian Munro.
Intended Audience
This paper is accessible to all computer scientists.